The Machine Learning Interview Checklist

11 minute read

Published: January 14, 2023

Cover your bases.

The Machine Learning Interview Checklist

Generally speaking, interviews for Ph.D. positions, internships in machine learning are generally structured into three parts:

mathematics,
programming, and
machine learning. This is sometimes accompanied by a presentation of a paper or your work. In the following, I collected the most frequent topics you will potentially encounter.

Mathematics

It is not all you need for a machine learning interview, but reading and comprehending each concept in the freely available Mathematics for Machine Learning book is what can lay the foundations on the math and fundamental machine learning side. Regarding mathematical concepts, you need to be fluent at least in:

Linear algebra
Probability
Analysis
Optimization

Linear algebra

Since neural networks consist of matrices and tensors, it is essential to be aware of the corresponding operations—knowledge of matrices is a must, tensors are a nice-to-have, but you need to have an intuition that tensor can be thought of “generalized matrices into higher dimensions”. Matrix knowledge includes how linear equation systems and matrices correspond, what are the computational aspects of matrix algebra (cost) and how matrices can express linear transformations—unfortunately, bonus points cannot be collected for knowing the eponymous film series.

Linear equations can be thought as (intersecting) hyperplanes in a vector space, so characterization of a vector space is essential. When we connect matrices—what we do since they can describe the equation system—to vector spaces, we also want to measure angles and distances, so being aware of norms, inner products is also a must.

When we talk about matrices, decompositions such as Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) cannot be neglected. Here the key is to understand what these decompositions mean, and also what we can do with them (low-rank approximations and even unsupervised representation learning in simple cases).

This figure from the MML book can be very helpful to comprehend all things matrix philogeny: Matrix philogeny (source: Mathematics for Machine Learning book)

Checklist

Probability

Understanding Bayes’s theorem is the bread and butter for several machine learning algorithms. Besides helping with the Monty Hall Problem, it is a fundament of a broad range of methods aiming to infer unmeasured quantities. However, not everything is Bayesian estimation: Maximum Likelihood and Maximum A Posterior methods are also frequently used. Factorization, independence and the latter’s relation to (un)correlatedness are further essential concepts, since assumptions on distributions generally are about them. To juggle with distributions, we need to distinguish them by names such as marginal, conditional, or joint; to manipulate them (read: to use Bayes’s theorem), we need the Sum and teh Product rules. Be also aware of the special role of the Gaussian, and exponential families (to get conjugate priors for Bayes estimation, leading to closed-form solutions) can also become handy. Staying with the Gaussian, its prominence in the Central Limit Theorem is the basis for using the Gaussian assumption on distributions. When we want to transform probability densities, then the change of variables formula will be our tool. There are properties (sufficient statistics) that can describe distributions concisely, the mean and variance are such (they are sufficient to describe a Gaussian) and they have interesting properties. It is also worth knowing that they are instantiations of (central) moments, which can be thought as a general family of descriptors for probability distributions. A pinch of information theory, i.e., quantities related to the information content of a random variable, are used in several fields of machine learning. (Differential) Entropy is the most important concept, but cross entropy and mutual information are also essential.

Checklist

I would also recommend some online courses, particularly the Bayesian Statistics Specialization on Coursera - with flashcards for the first two courses available on my blog; here for the first, and here for the second course. Additionally, there are related aspects in the Probabilistic Graphical Methods Specialization on the same website, with flashcards by yours truly for PGM1, PGM2, and PGM3.

Analysis

Analysis is mostly a tool for optimization in the context of machine learning - if you happen to be a theorist, it can be much more, but let’s stick to the fundamentals. Knowing what the derivative is, what it means (also for vector valued functions) is essential to understand optimization methods and the Taylor approximation. So the Jacobian and the Hessian should be a trusted acquaintance. The former is also required for the change of variables formula mentioned above.

Checklist

Optimization

Since machine learning evolves around mostly gradient-based optimizers, Stochastic Gradient Descent (SGD) makes the top of the list. Taking a step back, you should be aware about the family of first-order methods, and why they are popular (computationally low cost). But beware of the caveats: local optima, setting the step size and co. So as a follow-up, ponder why second-order methods should (they are aware of the curvature) and should not (calculating the Hessian is costly) be used. When we need to incorporate prior knowledge/constraints, then a Lagrange-multiplier will come handy. It is also good to know why we love convex optimization (i.e., to wonder about the good old days when problems were convex, as ML problems will not be convex in most cases). It might also be useful to know about the Karush-Kuhn-Tucker (KKT) conditions, which collect necessary conditions for the solutions of constrained optimization problems, including problems with inequality constraints in nonlinear programs (a synonym for optimization problem). To transition towards ML-related topics, here you should know optimizers such as ADAM, Nesterov momentum, or RMSProp - the key is they use an averaging procederure (‘momentum’) to incorporate previous update(s). As a final twist, knowing conceptually how modern ML frameworks implement gradient calculation (automatic differentiation) also belongs to the good-to-know facts.

The hilariously-titled All the Math You Missed (But Need to Know for Graduate School) also looks promising, but as far as I can tell from skimming it, it is for the next level.

Machine learning

Fundamentals

The categories of machine learnin (supervised, unsupervised, and reinforcement learning are the three main categories, but self-supervised and semi-supervised learning also belongs) are a must.

For supervised methods, classification and regression are the categories you need to be aware of, including what loss functions (cross entropy vs mean squared error) are used. Additionally, the Support Vector Machine (SVM) with its hinge loss also often comes up. Flavors of regression (linear, polynomial) are also prevalent.

For unsupervised, PCA from linear algebra is a trusted friend, but it needs to be accompanied by k-means and Gaussian Mixture Models (GMMs), including how the latter two relate to each other (GMM is “soft” k-means).

Reinforcement learning: model-free and model-based, offline and online RL are useful categories to keep in mind. To my best knowledge, these will mostly come up during an interview if you want to work in the field of reinforcement learning.

Nitty-gritty details of what coulf (and will) go wrong during training and how to fight them also comprises the essential toolbox of a machine learning engineer/researcher, including data preparation, architecture design. This is to find a trade-off between underfitting and overfitting.

Checklist

Specifics

Of course, they are the field-specific knowledge like ResNets, Transformers, CNNs for computer vision. Here generally what is expected to provide a high-level, intuitive understanding of modern/state-of-the-art methods, but it is generally not required to comb through arXiv digest daily.

Programming

Know machine learning frameworks (PyTorch and TensorFlow are the big two, but JAX is on the rise as far as I can tell), and you can shine if you can compare them. If you use any of those, then you should prepare to state why you use it.

Additionally, there are Easter eggs where you can show your commitment—even better if you can showcase it on your GitHub profile. I am talking about the dreaded triad of

documentation,
unit testing, and
Continuous Integration As an extra, you can add in knowledge about container-based frameworks such as Singularity or Docker - most cluster infrastructures are based on those.

For Python, realpython.com is my go-to resource, you can learn about all these concepts there.

Checklist

Your two cents

Let it be research or programming, be honest and show your commitment and enthusiasm. If you have fields, you are interested in, say so. If you have a relevant project (plan), definitely talk about it.

Remember, think with the head of the interviewer.

Share on

Twitter Facebook LinkedIn

Rotating Features For Object Discovery

4 minute read

Published: December 12, 2023

Structure is a useful but underleveraged inductive bias for representation learning.

Higgins et al. - Towards a Definition of Disentangled Representations

8 minute read

Published: June 30, 2022

Disentanglement is a concept rooted in geometric deep learning.

Where is the nature of the relationship expressed in causal models?

1 minute read

Published: June 28, 2022

Graphs don’t tell about the nature of dependence, only about its (non-)existence.

AMMI 3 Notes: Geometric priors I

15 minute read

Published: June 20, 2022

In the previous post, we dived deep into abstract algebra to motivate why Geometric Deep Learning is an interesting topic. Now we begin the journey to show that it is also useful in practice. In summary, we know that symmetries constrain our hypothesis class, making learning simpler—indeed, they can make learning a tractable problem. How does this happen?

Patrik Reizinger

The Machine Learning Interview Checklist

The Machine Learning Interview Checklist

Mathematics

Linear algebra

Checklist

Probability

Checklist

Analysis

Checklist

Optimization

Machine learning

Fundamentals

Checklist

Specifics

Programming

Checklist

Your two cents

Share on

You May Also Enjoy

Rotating Features For Object Discovery

Higgins et al. - Towards a Definition of Disentangled Representations

Where is the nature of the relationship expressed in causal models?

AMMI 3 Notes: Geometric priors I