top of page

Most Commonly Asked FAANG Interview Concepts - Part 1



Hi All!! Hope everyone had a great weekend!! There is a pattern in everything, be it weight loss, building strength, or upskill. If you follow the pattern to get to where you want to be, it can be attained. Similarly, I was thinking, there could be a pattern of questions asked in several FAANG Data Science interviews. And so, I have thought about curating a list of some most commonly asked interview topics in FAANG interviews.


1. What is overfitting


Overfitting is the error where the model 'fits' the data too well, resulting in a model with high variance and low bias.


As a consequence, an overfit model will inaccurately predict new test data point even though it has high accuracy in the training data.


From the below example picture, you can see that the green line overfits the training data, and the black line it the model in a more regularized way. The green line seems to be more dependent on the training data and will result in a high error rate on test data when compared to the regularized model.

Photo from Wikipedia

2. What is dimension reduction:

It is a transformation technique that reduces the number of input variables in a dataset. These techniques can be used in applied machine learning to simplify a classification or regression dataset in order to better fit a predictive model.


Having a large number of input features (columns) in a test when compared to the data samples (rows of data). This can affect the performance of machine learning algorithms fit on data with many input features. This scenario is referred to as the “curse of dimensionality.”

Therefore, it is often desirable to reduce the number of input features.


Different techniques for Dimensionality reduction:

The most common approach to dimensionality reduction is called principal components analysis or PCA. Other methods are:


4 advantages on performing dimensionality reduction:

  1. Reduces the time and storage space required

  2. Removal of multicollinearity improves the interpretation of the parameter of the machine learning model

  3. Becomes easier to visualize the data when reduced to very low dimensions such as 2D or 3D

  4. Avoids curse of dimensionality


3. What is boosting?

Boosting is an ensemble method to improve a model by reducing its bias and variance, ultimately converting weak learners to strong learners.


The general idea is to improve the model by learning from the previous learner.

The below example picture is from a blog from Jocelyn D'Souza on "A Quick Guide to Boosting in ML". The picture clearly shows that it is a sequential process and to boost the model one step at a time.



4. What are random forests? why is Naive Bayes better?

Random forests are an ensemble learning method for classification, regression, and other tasks that operates by constructing a multitude of decision trees at training time.


The below picture shows multiple decision trees for a classification model that predicts either Class A or Class B. In the case of the classification model, the output of the random forest is the class selected by most trees. For regression tasks, the mean or average prediction of the individual trees is returned.



Why Naive Bayes is better?

it is easy to train and understand the process and results. A random forest can seem like a black box. Therefore. a Naive Bayes may be better in terms of implementation and understanding. However, in terms of performance, a random forest is stronger because its an ensemble technique


5. How do you handle null?

Handling nulls seems to be the most common task within data cleaning processes. There are several ways to handle nulls and a few of them are mentioned below:

  1. Omit the rows with null values

  2. Replace null with measures like mean, median, mode or replace with value e.g. NA

  3. Predict the null values based on other variables. For e.g. if a row has a null value for weight, but it has a value for height, you can replace null with an average weight for the given height.

  4. Leave the null values if you are using an ml model that automatically deals with null values.

The questions don't end here. The next set of questions will be posted next week and feel free to like, share, and subscribe! Thank you for reading the post and feel free to let me know your comments and feedback. Until next week, Happy Learning!!


Top viewed articles:

SQL Interview Series: Part 1, Part 2, Part 3, Part 4 & Part 5

Python Interview Series: Part 1, Part 2, Part 3 &Part 4

コメント


bottom of page