The Ultimate Guide to Acing Machine Learning Interviews for Data Scientists and Machine Learning Engineers

machine learning Feb 14, 2023

Preparing for machine learning interviews is hard. You can memorize 200 questions and answers and still encounter questions in the interview that you are not prepared for.

That’s why this article takes a different approach. Instead of going over dozens of individual questions, we’ll cover the 4 categories of questions you will encounter in a machine learning interview, which will give you a comprehensive understanding you can use to ace interviews!

Please note that mastering these 4 categories may not be enough. Generic coding questions (algorithms and data structure) and system design (designing a non-machine learning system) also appear in interviews.

Machine Learning Basics

These are conceptual questions designed to get a general idea of your basic machine learning knowledge. They can cover anything ranging from processing data to choosing models, handling details of training models, and evaluation.

You should expect these questions either at the beginning or end of an interview.

How to Answer

For these questions, you want to be concise and organized. You can use something like the following outline:

  1. Give a concise definition in 2 to 3 sentences.
  2. Give one or two examples to convince the interviewer that you have both theoretical knowledge and experience.
  3. If necessary, provide some common solutions to the problem.

How to Prepare

These three steps will help you prepare to answer machine learning basics questions:

  1. Brush Up on the Basics

    Thinking through and summarizing information is a great way to learn. Here are some resources to help:

    One way to test your understanding is to try explaining concepts to a non-technical person. If you can do this, you truly understand what you are talking about!

  2. Collect Questions

    There are several places you can find sample interview questions:

  3. Organize the Questions

    Once you have questions, you need to organize them.

    Don’t skip this step! We recommend organizing questions by workflow as this will help you to more clearly see the problems that arise at each step and know what questions are likely to be asked.

Machine Learning Coding

These questions occur during the onsite and ask you to implement a machine learning algorithm from scratch with any language you prefer.

Because there are so many machine learning algorithms that may seem like a very daunting task but don’t worry! There are a limited number of algorithms that will actually appear in interviews. The most common ones (according to this helpful blog post) are:

Supervised Learning:

  • Linear regression (video)
  • Logistic Regression (video)
  • K-nearest Neighbors (video)
  • Decision Tree

Unsupervised Learning:

  • K-means Clustering

How to Answer

We recommend using a similar process for these questions as you would for answering generic coding questions. Those steps include:

  1. Briefly explain how the algorithm works to the interviewer.
  2. When implementing your solution move from the main function to helper functions. The main function handles the input data and returns the results. The helper functions should handle small tasks such as initializing parameters or computing gradients.
  3. Explain your code step by step to the interviewer. It’s your choice either to explain while writing code or to finish most of the coding before summarizing your solution.
  4. Keep your implementation bug free and readable.

How to Prepare

5 algorithms isn’t a lot, but memorizing the code is still unrealistic. Instead, you should focus on understanding and internalizing the algorithms.

To familiarize yourself with the algorithms make sure you understand each step clearly. Andrew Ng’s machine learning class is great for this.

After that, practice is essential. Writing code in Python on a Jupyter notebook is highly recommended for debugging and testing purposes. Here are some useful practice steps:

  1. When implementing the first time, write everything as one function without worrying about the best coding practice.
  2. Focus on having a working solution without using any third-party libraries.
  3. Work on breaking your code down into functions based on the algorithm steps.
  4. Ask yourself the space and time complexity of implementation in big O notations. This is often asked as follow-up questions in interviews.

Applied Machine Learning Problems

These questions are the most difficult. They typically involve an open-ended question and developing an applicable machine learning solution.

These questions are the highest weighted, and you should expect the interviewer to challenge you on your solution and dig into the details to assess your ability.

Questions can be generic such as “How do you detect spam emails?” or domain specific like “How would you design a recommendation system?”. The more experience you have the more likely you are to receive domain specific questions.

How to Answer

To start you should clarify things like what the goal is, available data, and constraints.

Then walk through your overall ideas with the interviewer. Following a structure will help to keep things clear. Here’s one helpful structure you can use:

Data

  • Clean data and dealing with outliers

Feature Engineering

  • Brainstorm the features needed for the task
  • Engineer new features if necessary

Models Selection and Engineering

  • Select 1 to 2 models that are suitable for the problem
  • Discuss the pros and cons of the models

Training, Model Tuning, and Evaluation

  • Develop metrics for evaluation
  • Design training, validation, and evaluation strategies
  • Discuss methodologies that improve the performance

You will be asked follow-up questions which can be overwhelming but remember back to the structure and complete your design. Show that you can stay on track and lead the discussion.

How to Prepare

You will need to appear differently for generic versus domain specific questions.

For generic problems, Kaggle is an excellent resource.

Try working on the projects yourself and then compare your solutions to others, especially the Exploratory Data Analysis (EDA)data processingfeature selection, and model selection.

After you’ve done a few of the projects, you should have a good sense of solving that type of problem.

Domain specific problems are tricky because they do require real work experience to answer effectively. However, if you don’t have that experience the best way to prepare is to read research papers, which you can find by searching keywords in Google Scholar.

When reading, focus on the data formatfeatures engineeringmodel architectures, and results/findings.

Project-Based Machine Learning Questions

These questions can be either technical or non-technical depending on your interviewer.

The discussion will begin with the interviewer asking you about a past machine learning project. They will then either dive into the technical details (Ex: What is the size of the data? How did you select features?) or ask about things like business impact and leadership (Ex: Did you work with other teams? Did you lead any of the process?).

How to Answer

The most important thing to remember with these questions is to always interact with the interviewer. It needs to be a conversation.

Besides that, as usual, you want to keep it structured. Here are some steps for describing your project:

  1. Summarize your project in 1 to 2 sentences. Then outline the impact. It’s better to quantify it by numbers than using subject words.
  2. Highlight 2 to 3 challenges of the project.
  3. Share one interesting finding with the interviewer.
  4. If the interviewer is more interested in your leadership and influence, you can also talk about 1 to 2 non-technical contributions you made.

Be sure to check in with the interviewer after each part to see if they have questions or want you to move in a particular direction.

How to Prepare

You can prepare for these questions in 3 steps:

  1. Summarize Your Project

    Make sure that you can summarize the goal and impact of your project in concise and simple words. Don’t worry about including every detail, but focus on the challenges you faced in the quantitative results you’ve achieved.

  2. Think through Technical Details

    Many times you will not need to dive into the technical details, but it pays to be prepared in case your interviewer is interested in them. Make sure you’re able to answer questions about the data processing, the models, modeling details, and model evaluation.

  3. Practice Out Loud

    Practice! Practice! Practice! It is the only way to ensure that you can easily and effectively describe your project in an engaging way.

If you found this guide helpful and want more, check out the longer version of this post.

Effortlessly learn data science and prepare for data science interviews with our free, organized resources.
Download All Resources Now!