Over the years, I gave interviews for various Data Science roles across organizations. This is a collection of all the questions that were asked to me, or my friends that I have collected.

Statistics/Probability:
You toss a coin 100 times. What is the probability of 40 heads?
You toss a coin 100 times. You observe 40 heads. What can you say about the coin?
You have a stream of data coming. You need to do random sampling and maintain only k samples. What is your sampling strategy so that your sample is not biased?
How do you compute the distribution of sum of two poisson distributions?
What is Normal Distribution? Why is it so special?
Explain Central Limit Theorem.
X, Y, Z are three uniform random variables. what is probability that max(X,Y) is > Z?
Three numbers A, B, and C are drawn from uniform distribution. What is the probability that they will be in order i.e. A<=B<=C or A>=B>=C
You play a game of (six-sided) dice where if you get a 6, you win. What is the average number of games you need to play before you win?
Link to answers for the questions from above section.
Linear Algebra
What are Eigen Vectors? How do you compute them? What is SVD? What’s the relation between Eigen Vectors and SVD?
What is PCA? Why do we use covariance matrix for the decomposition? Why not anything else? What is the intuition behind it?
What happens to the closed form solution of a linear regression in case of multicollinearity? How does regularization help in such case?
Machine Learning/Deep Learning
Explain Naive Bayes. When does it work well? Is it a generative model or a discriminative model? What are some of the underlying assumptions it makes?
What is the difference between generative and discriminative models?
What is loss function of Non-negative Matrix Factorization? Is it convex?
Explain k-means algorithm. What are the parameters to it? Will it always converge? What all do you need to take care while you use k-means?
What is time-complexity of k-means clustering? What about for hierarchical clustering?
What is multicollinearity? Why is it an issue? How do you solve it?
Does Linear Regression have closed form solution? Why is gradient descent used for linear regression if closed form exists?
What is Lasso and Ridge regression?
Why do we use L1 and L2 regularizations commonly? Why not L0 or L0.5, or anything else?
When will you use Mean Squared Error vs Mean Absolute Error?
What should be dropped first in Time series decomposition? Seasonality or Trend?
What is log-loss? How is gradient descent done on log-loss?
What is ROC curve? What are the axes? What does area under the curve represent?
What are factorization machines? When are they used? What alternatives can be used for similar problems?
How can we optimize an algo for rank order of the predictions in a recommender system?
How do you fit a model to a real-life distribution if your training data comes from a different distribution?
Explain Bias-Variance tradeoff. How do you detect Bias and Variance?
How do you fix high bias? high variance?
Do decision trees have high bias or variance? How do you solve it?
Explain random forest. How does it solve high variance problem?
What is bootstrapping?
You found less training error but high test error. What will you do?
What is backpropagation?
What are activation functions? Why do we need them?
Why are ReLu activations widely used? Why can’t we use sigmoid or tanh instead of ReLu?
When I have millions of classes in output, softmax would be very costly. How can we optimize that?
How do you choose no:of hidden layers and no:of neurons in each hidden layer of a DNN?
How do you control for variance in Neural Networks?
Why is bias added in each neuron?
In node2vec, what does embedding represent? topological similarity or nearness?
How is fastText different from word2vec?
How do you search effectively for nearness embeddings when you have millions of data points?
How do you get sentence meanings from word embeddings, considering the position of words in the sentence?
General DS related:
How are AB tests done? How do you evaluate if a test is successful?
What is experimental design?
Why do we need a hold-out dataset? What happens if you train a model on the full dataset?
Explain MAB. How is it performed? What is Thompson sampling? What is epsilon greedy method?
How do you handle missing values and outliers?
Coding/Engineering/Data structures
Check if a linked list has a loop in it
Write code for any sorting algo you know. What are best, avg, worst time complexities?
How do you identify if a binary tree is subset of another tree?
Given a binary tree, how do we print full path from a leaf to root node?
Using multiple stacks, how do you get functionality of a queue?
You have n sorted arrays. Merge them to one sorted array.
You have an unsorted array. Given the index of an element, get the first element after it, that’s larger than this element. Do in constant time
Given an array, find the max sum from a consecutive subarray in it that do not contain elements from a blacklisted array
Given a linked list of string, how do we identify if it is palindrome?
Spark — What is difference between a Map and flatMap?
Spark — How do partitions work?
If you liked this article, subscribe now to get more such articles directly to your mailbox :)