Assessing the Feasibility of a Machine Learning Model

Questions to answer before any code is written

Jared Rand
Towards Data Science

--

Before jumping right into your new modeling task, consider performing a feasibility study. A feasibility study of a predictive model will answer key questions that can help you and the business decide if the modeling task is likely to succeed.

Photo by krakenimages on Unsplash

Feasibility Studies for Machine Learning

A feasibility study is an assessment of the practicality of a proposed project or system.

- Wikipedia

Feasibility studies are common across industries and disciplines. They are an important project planning tool that can help you identify points of failure in a project before any money or time gets invested.

I would argue that feasibility studies are particularly useful for machine learning projects because ML projects are generally experimental in nature. They can fail for many reasons, some of which can be identified upfront with a feasibility study.

Questions to Ask During a Feasibility Study

I like to use the following template when doing a feasibility study for a predictive model.

  1. Training Data — Does training data need to be collected? If so, how much time and money will it cost?
  2. Predictive Features — According to domain experts, what factors are likely to predict the target variable? Is that data accessible to you?
  3. Data Sources — What data sources will you need to gain access to? If internal, do you have support from data engineers? If external, how much will vendor data cost?
  4. Production — What is the level of effort to develop, deploy, and maintain your model in production?

Training Data

Photo by Jonathan Borba on Unsplash

Let’s assume your modeling task is supervised (because if it’s not, the conclusion of your feasibility study will almost always be “unlikely to succeed” 😅).

If you don’t have training data, you’ll want to create a plan early on for how you’re going to acquire it and how much it will cost to do so.

You’ll need to make a budget request at some point, and you’ll be glad you have your feasibility study in hand. It’s the perfect document for building a business case and justifying budget requests.

Predictive Features

Photo by Bluewater Globe on Unsplash

It’s not a good idea to throw the kitchen sink at your model. But how can you narrow down possible predictive features upfront in a feasibility study?

Ideally, you will have access to subject matter experts who can suggest scenarios or risk factors that typically lead to the outcome you’re trying to predict.

You can also study a small sample of your data to deepen your understanding of what drives the target outcome. You might have to do some research online or (gasp) talk to customers in order to understand the context around your data points. I find the learnings from doing so are always worth the time invested.

Once you have a qualitative understanding of what drives the target outcome, you can more clearly articulate what data you need and why. This helps build a business case when the time comes to make data requests.

Data Sources

Photo by Christopher Burns on Unsplash

After the previous step you should have a good grasp on what data you need to build a successful model.

In this step, you need to figure out where the data for each predictive feature will come from. I think of data access in a hierarchy like this.

  1. Internal data that you already have access to is best.
  2. Internal data that requires work from dev ops or data engineering to make available to you is next best.
  3. Data that are not available internally may be available externally from clients, partners, or the government for free.
  4. Vendors or data aggregators may have the data you need. This kind of data is never cheap, requires long contract negotiations and integration/ingestion time internally, and often suffers from data quality issues hidden during the sales process.
  5. Web scraping may be worth considering, but has many challenges. Keep in mind that scraping is against the terms of use of most large sites. So even if you build an elaborate siege tower to get past their walls of defense, you’re not legally allowed to use the data you pillage.
  6. If you’ve made it this far, assume the data is not available. Assess the impact this lack of data will have on the project overall. Is the project likely to succeed without this data?

Production

Photo by Chris Murray on Unsplash

Level of Effort

Estimating level of effort (LOE) can be hard for data science projects because they usually contain experimental or iterative phases. The overall project timeline can shrink or expand based on the outcome of experiments and explorations.

So I prefer to break down ML projects into fairly granular steps, each with its own LOE attached. This way, steps with little uncertainty (such as engineering a particular feature) can get a reliable LOE, while steps with large uncertainty (such as iterating on a model prototype after feedback from business users) can get an LOE range.

I find that this approach helps product managers feel more confident about integrating your model into the product roadmap.

Quick Wins

When breaking down the project into steps, I also like to build in some quick wins early on. A project plan that delivers incremental value is much easier to swallow for the business than one that only delivers value at the end.

Model Lifecycle

Be sure to consider the entire model lifecycle. Even if your team isn’t directly responsible for all aspects of deployment and maintenance, you need to be confident that the project won’t hit any roadblocks there. Remember, the purpose of the feasibility study is to assess likelihood of success; if there’s no path to production (even if you are able to develop a model), you should probably kill the project and move on.

Headcount and Timeline

Your detailed LOE estimate is a great tool for making headcount requests. You can show, say, a timeline with 3 heads vs one with 4 heads. And if you’re a team of one who needs to build a data science team from scratch, you could show a timeline for that, too.

Conclusion

A feasibility study for your data science project is a great way to do the following:

  1. Catch points of failure early on before any code is written.
  2. Build a business case and make resource requests for data and headcount.
  3. Increase the chances of getting your project on the product roadmap.

--

--