Predictive modeling is a method of creating mathematical models that are used to predict the likelihood of future events or outcomes based on available data about past events. Predictive modeling has a long history and has evolved over many centuries. However, the exact date or period of the birth of predictive modeling cannot be unequivocally specified since its fundamental principles and algorithms have gradually developed over time. Even in ancient times, people tried to predict future events using various approaches and intuitive methods.
The earliest known examples of the use of predictive modeling are related to ancient civilizations such as Ancient Egypt and the Babylonian Empire, where astronomers used observations of the stars and planets to create a technique for weather forecasting and determining seasons. They noticed that consecutive observations of the movement of stars and planets make it possible to predict future changes and make calendars. Various cultures also had methods for predicting the future based on dreams, visual signs, or prophecies.
The rise of science and statistics in the 18th and 19th centuries had a significant impact on the development of predictive modeling. During this period, key methods and principles were developed that formed the basis for modern predictive models.
One important milestone of those years was the emergence of statistical analysis. Scientists began collecting and systematically analyzing data to study various phenomena and processes. This led to the development of methods for statistical data processing and statistical inference.
In the late 18th century, German statistician and political economist Carl Friedrich Gauss made a significant contribution to the development of statistics by introducing methods for data analysis and interpretation. His work included various statistical methods, such as the method of least squares, which was used for data approximation and prediction.
In the 19th century, statistics became even more advanced and widely accepted. A key contribution to the development of predictive modeling was made by the work of the English mathematician and statistician Francis Galton. He carried out research to measure the relationship between various variables such as height and heredity and introduced the concept of correlation to measure the degree of relationship between two variables. Based on his own analysis on the dependence between the height of fathers and the height of their sons, Galton suggested using a regression method that allowed statisticians to analyze the relationship between dependent and independent variables and use this relationship to predict future values. This became the basis for the development of regression analysis, which is one of the key tools in predictive modeling.
Classification and correlation are important statistical techniques that also developed in the 19th century.
An important contribution to the development of classification was made by the Belgian mathematician and statistician Adolphe Quetelet in the middle of the 19th century. He used statistical methods to study social phenomena and introduced the concept of the “average man” (l’homme moyen). However, he did not develop a formal classification system. Subsequently, other scientists and statisticians contributed to the development of the classification.
When it comes to correlation, it is appropriate to recall a person with whom Galton worked side by side for many years. This is Karl Pearson, who was an outstanding mathematician and Galton’s biographer. Pearson further developed the mathematical apparatus for calculating correlations. This led to the creation of the Pearson Correlation Coefficient, widely known today.
Furthermore, at the end of the 19th – beginning of the 20th centuries, the mathematical apparatus of probability theory, which is an important tool for modeling random phenomena and forecasting, was actively built. An important contribution to the development of probability theory is associated with the names of Russian mathematicians such as Pafnuty Chebyshev, Alexander Lyapunov, Andrey Markov, who solved a number of general problems in probability theory. Chebyshev obtained a formulation of the law of large numbers in a very general form and formulated a central limit theorem for sums of independent random variables. Lyapunov introduced the method of characteristic functions into the doctrine of limit theorems in probability theory. Markov significantly advanced the research of his predecessors by developing the mathematical theory of Markov chains. This theory has become a fundamental tool for modeling random processes and has been widely used in many fields, including statistics and artificial intelligence.
Despite the active work of scientists, mathematicians, and statisticians, who laid the foundation for the development of predictive modeling in the 18th and 19th centuries, the technical component of the process of building predictive models remained at a relatively low level. Scientists had to carry out calculations manually, using mathematical tables and formulas. The calculations could be quite complex and time-consuming, requiring a lot of time and effort.
Mathematical tables such as logarithmic tables and tables of trigonometric functions were widely available and used to perform various mathematical operations. Mathematicians sought values of functions, carried out arithmetic operations, and solved equations based on these tables.
In addition to the tables, various manual counting methods were used to perform the necessary calculations. Scientists used pencils, pens, paper, and other tools for writing and calculating. For example, to plot graphs or visually represent data, they could use rulers, compasses, and other tools for drawing and measuring.
Counting machines, such as various types of arithmometers, were also developed and used to facilitate mathematical calculations. These mechanical devices made it possible to automate some arithmetic operations and reduced the time spent on manual counting.
Thus, in the 18th and 19th centuries, predictive modeling was largely carried out with the help of manual counting, mathematical tables, and calculating devices. In this regard, the process was very laborious, painstaking, and time-consuming.
However, in the second half of the 20th century, the digital revolution began, which represented a rapid development and penetration of information technologies into all areas of activity, leading to revolutionary changes in data processing and utilization. This period is characterized by a sharp increase in the power of computing devices, improvement in their performance, and their accessibility to a wide audience.
As a result of the digital revolution, computational devices became accessible to a broad range of users. Previously, they were limited in cost and could only be used for scientific and military purposes. However, with the advancement of technology and mass production of computers and personal devices, their cost decreased, and a wider range of people could afford to buy them. This contributed to the spread and use of computers in various fields, including predictive modeling.
In this regard, various programs and tools started to emerge, performing specific functions such as data preprocessing, feature selection, model selection, and more. This required extensive knowledge and skills from researchers and analysts in various software environments.
However, the technological progress does not stand still, and with the development of Automated Machine Learning (AutoML), integrated platforms and tools, that consolidate all modeling stages into a unified workflow, emerged. This simplifies and automates the modeling process, minimizing the need for specialized programs and knowledge. Now researchers can focus on the core aspects of modeling, while AutoML platforms provide automated data processing, selection of the best models, and process optimization.
Thus, predictive modeling has evolved over a long period of time, and its history is closely connected with the development of science, statistics, computer technology and machine learning methods.