Predictive modeling is the process of creating a mathematical model that is used to forecast or predict the likelihood of future events or values based on available data and known relationships of this data. It is used in various fields, including business, finance, marketing, economics, science, and others.
Predictive modeling requires data processing and analysis, selection and configuration of appropriate machine learning algorithms and methods, training the model on historical data, and evaluating its performance on new data. The goal is to develop a model that provides accurate and reliable predictions based on the available data.
Predictive models can be applied to various tasks such as forecasting sales, product demand, financial performance, market prices, customer behavior, and others. They can be used for decision making, planning, business process optimization and providing strategic advice.
The predictive modeling process has evolved significantly over time, primarily due to the digital revolution. One of the key enabling factors of the digital revolution is the ability to process and store large amounts of data. With the advent of powerful computing systems and data storage technologies, it has become possible to work with massive datasets that were previously inaccessible or challenging to process.
Due to technological limitations and data processing availability, in the past, different programs and tools were required for each step in the modeling process, each performing specific functions. Here are some of them:
- Data preparation programs: At the beginning of the modeling process, data pre-processing was required, including removal of outliers, filling in missing values, scaling and transformation of variables. For these purposes, various programs have been used and are still being used, such as Microsoft Excel, Python with the Pandas and NumPy libraries, or specialized tools for data preprocessing.
- Programs for feature selection and feature engineering: Creating new features or selecting the most significant features from the original data also required the use of specialized programs. These could be machine learning tools such as scikit-learn or TensorFlow, or specialized feature engineering tools such as Featuretools or tsfresh, but more often features were created manually from formulas or code. Feature generation is a key aspect in the field of machine learning that attracts particular attention from data analysts. Numerous scientific publications are dedicated to this topic, which indicates its importance and relevance. Developing software solutions for feature generation is a popular subject of research, as it has a significant impact on the quality and effectiveness of the machine learning process.
- Programs for working with variables: Various programs and tools for statistical data analysis, such as SPSS, SAS, were used to transform the values of variables and reduce the variance, as well as to measure the correlation between variables. These programs provided functionality for variable transformations such as standardization, normalization, binning, and other methods that helped reduce variance, smooth outliers, and prepare data for easier analysis. They also offered different methods for measuring inter-variable correlations, including Pearson correlation, Spearman correlation, and others.
- Model building programs: Various machine learning algorithms such as logistic regression, random forest, gradient boosting and others were used to create predictive models. Previously, users often used specialized programs like SPSS Statistic, Loginom, MathCad, or libraries such as scikit-learn, XGBoost, TensorFlow or R to train and tune models. The files obtained from the previous stages were fed into these programs.
- Programs for evaluating and comparing models: After training the models, it was necessary to evaluate their quality and compare the results. For this purpose, model evaluation metrics were used, such as the Mean Squared Error, coefficient of determination (R-squared), Accuracy and Recall, and others. Previously, users typically programmed the calculation of these metrics or used specialized programs to analyze models and compare results.
Thus, the user had to work with several programs and transfer data between them. This meant that the modeling process was a rather fragmented, complex and time-consuming process that required additional efforts to set up the environment and exchange data between programs, as well as specialized programming knowledge and skills in various software environments. In addition, it should be noted that these processes required significant computational resources, took all the available power and left no room for parallel execution of other tasks.

However, with the development of automated machine learning (AutoML), significant changes have taken place. Modern AutoML platforms and tools integrate the various steps of the modeling process into a single system. They offer user-friendly interfaces and intuitive workflows that allow users to complete all required modeling steps without the need for separate programs for each step.
Thus, for example, software manufacturers that offered desktop statistical packages are starting to implement the AutoML module in their analytics systems (Alteryx, KNIME, etc.); major server providers offer to use their computing power and cloud hosting for ML development (Amazon, Yandex, Google, etc.); as well as solutions, that offer AutoML in the cloud, emerged (Vertex AI, ANTAVIRA, etc.). Each manufacturer has its own vision and its own definition of AutoML.
Currently, users can conduct predictive modeling work in the cloud using a single AutoML platform or tool according to their predictive modeling vision and budget. This reduces the time and effort previously required for setting up and transferring data between different programs, as well as eliminates the need for users to utilize their own computational resources. By integrating all modeling steps in one tool, users can work more efficiently, experiment with different settings, and get predictive modeling results in a more convenient and integrated way.

One such development is the ANTAVIRA platform, created by a team of data analysts and programmers who were engaged in predictive modeling using all of the above tools. Within the framework of this platform, its own approach to the process of automated machine learning has been developed, which, according to the developers, contains the necessary functionality to automate routine tasks and to reduce the amount of manual work, as well as to improve the quality of model building.

AutoML in the ANTAVIRA platform is achieved by combining all the stages of modeling for a variety of targets into a single chain and setting them up with the help of a “single window”. At the same time, the created technology makes it possible to simultaneously run an unlimited number of counts with the same or different settings for the desired number of targets without sample size restrictions. In addition, the developers have focused on the economic component, which allows to reduce your own costs.