Rapid Prototyping in Data Analysis Projects

December 13, 2021

Projects in the field of data science, as implied by their name, have a scientific component. The main task of projects in this area is to build a model that predicts the dependency of the target variable on the available data. For example, identifying the correlation between the price of a product, the likelihood of a defective part, or a purchase based on purchase information, the quality of raw materials, and so on.

Prototyping allows for risk reduction and verification of process readiness for the use of a machine learning model. This holds true for successful projects where we obtain a high-quality predictive or recommendation model, as well as for unsuccessful ones where it is challenging to find dependencies between the input data and the target variable. This approach is crucial for identifying potential issues, such as data unavailability, which may arise in later stages of the project.

Let's consider an example from one of the projects. After successfully building a predictive model at a plant and transitioning to the pilot operation stage, we encountered an issue where the data used for prediction was not available at the required moment. This was due to the architecture of the storage and the data extraction processes, the modification of which would significantly increase the project's cost. Detecting such a problem earlier in the project could have allowed us to address it with lower costs when building the system.

So, the complete list of goals for prototyping included:
  • checking the feasibility of the task;
  • checking data readiness;
  • compiling a task list for a large project;
  • checking the readiness of business processes;
  • assessing the impact of model implementation;
  • understanding situations when the model is applicable and when it is not applicable;
When creating a prototype, it is important to pay attention to several key aspects:

Firstly, it is necessary not only to create a functional prototype of the model but also to explain why it works in a certain way. This includes determining the importance of features for the model as a whole and for each specific case.

Secondly, there should always be at least a minimal but functional visualization. This allows for an interactive and business-friendly understanding of the model's operation.

The third important aspect is related to the limitations of the prototype. It is necessary to determine which specific 'cases' the model will cover so that it can be easily assessed and compared with real data. For example, limitations such as predictions only within a restricted temperature range or sales of a specific type of service.

It is more important to have a model that operates clearly for specific items than to have a model with low accuracy across the entire assortment at the prototype stage

Conclusion

Rapid prototyping allows for the assessment of the potential impact of a solution and identifies potential challenges in advance, helping to improve the system and achieve successful results.

Building a prototype of the model is not difficult if the business task it addresses is clearly formulated and only 'good' data that can be explained, analyzed, and understood are considered. It is also important to determine the system's scope of application, conditions, and indicators that will be taken into account.

Based on Softline Digital's experience, rapid prototyping can take from a few days to a week. The model is built using the necessary data. After completing the prototype, simple visualization is added so that the customer can see the data, forecasts, and the process of building graphs.

Usually, real data is used to test the system's performance, allowing the evaluation of the model's accuracy, forecasts it makes, and decisions based on them.

If the system does not work perfectly right away, rapid prototyping provides an opportunity to understand and make the necessary improvements. A prototype representing a working model demonstrates all the capabilities of the system and allows conclusions to be drawn about its effectiveness.

The prototype helps understand what is genuinely important, which data influences the correctness of calculations and forecasts, to check the adequacy of the model and understand how to improve the system. Thus, rapid prototyping technology allows for a more conscious approach to the machine learning implementation process and guarantees successful results.

Nikolay Knyazev
Nikolay Knyazev
Machine Learning Architect at Softline Digital