The progress made in the last 20 years in the field of statistics have made it possible to develop predictive algorithms that are much more efficient, especially in terms of precision. What are the possible applications in the field of cost estimation and modeling? While traditional analytical models based on the manufacturing processes of the product or service are still widely used in our Cartesian society, statistical models are gradually imposing themselves to their formidable efficiency. But rather than an opposition, these two methods are enriched and complement each other.
Traditional costing models
As a reminder, there are now 3 main methods used to estimate the cost of a product:
The analogical method
This method estimates the cost of a new product compared to similar products produced or purchased in the past. This method is unreliable, but can be used in extremely upstream phases (study of opportunity) when the characteristics of the project or the service are not yet known. We will not detail this type of basic estimate in this article.
The analytical method
It estimates the cost of a product by modeling the industrial production process. This method is based on the cost structure of the product of which it estimates each intermediate element, based on the materials & components involved, process costs (machine and labor), and related structural costs. This method has several advantages:
 It allows to estimate an optimized and theoretical cost of production by modeling a virtual factory on the basis of the best ratio (labor cost, TRS, Scraps, ...).
 It allows to give an ambitious cost target and to identify the "Best Landed Cost" for a given product.
 It also makes it possible to identify in a concrete way the sources of nonperformance of the suppliers (on which process step, which cost item, which indicator ...) and to engage with them a continuous improvement process to capture productivity.
 This method is therefore particularly useful in the downstream phases of the life cycle (production, continuous improvement, product redesign, etc.).
However, the analytical method has some disadvantages or constraints to its implementation:
 It requires a good understanding of the manufacturing processes involved as well as key parameters (TRS, Scraps, cycle time ...). So much information is not always easy to collect and capitalize with suppliers.
 The determination of the "Best Landed Cost" requires feeding these tools with benchmark data on production parameters, and keeping these benchmarks up to date.
 If the standard processes can be modeled more or less quickly (injection, extrusion, casting, cutting, striking, surface treatment ...), the encryption of a complex product is often tedious. It requires a specialized expertise that only a few people master in the company...
 As a result, encryption cells quickly experience bottlenecks, with processing delays incompatible with agile development and timetomarket constraints.
 Finally, if these models have a real relevance to give cost targets, they often lack precision, because they do not take into account the hazards or certain external factors (balance of power, market effects, ...) especially since many suppliers have a very low level of maturity on the control of their industrial cost price (PRI).
Existing software solutions on the market address some of these problems by offering in particular integrated benchmarks on several manufacturing processes with benchmark data per country. Some editors have also developed interfaces that provide CAD file reading, which allows automating the proposal of manufacturing processes (virtual factory). However, these kinds of software remain heavy and long to set up and are used only by a few experts.
The parametric method
This method estimates the cost of a product or service by statistical modeling. This method uses similar product or service histories to define equations or statistical laws that allow to model the evolution of the cost according to certain parameters known as "cost drivers". These models are mostly based on linear, multilinear, polynomial or logarithmic regressions. These estimation methods have several advantages:
 They make it possible to estimate the cost of a new product / service based on simple and known characteristics of the company (weight, size, volumes, country of production, key elements of the specification ...) without necessarily knowing the details of the manufacturing process or external benchmarks. It is therefore a very quick and simple method to implement.
 On the other hand, based on the observation of products / services actually manufactured or purchased in the past, the estimated cost is potentially more consistent and precise than a "theoretical" analytical model, provided that there is sufficient quality history.
 These statistical methods are particularly useful in the early phases of life cycle (opportunity, feasibility, detailed design ...) because they make it possible to make the right decisions quickly for an optimized design and thus to secure the margin while accelerating the “time to market ".
 Further downstream, they also make it possible to quickly analyze the consistency or the inconsistencies in the current prices, thanks to the dispersion analyses with respect to the predictive model. Thus, they reveal aberrant products or services, at an abnormally high cost, for example, with regard to the predictive model. This gives optimization leads for buyers (renegotiation, change of supplier) or for R & D (redesign).
On the other hand, these methods have several limitations:
 Traditional statistical models (based on regressions) hardly take into account the qualitative parameters (except to reduce the size of the database).
 They do not manage properly the missing data and therefore, require very clean databases.
 They mismanage "breaks" or threshold effects. For example, the price can have a linear behavior over a certain range, then a radically different behavior from a certain threshold (size, weight, volume ...) because the manufacturing process can change.
 All these elements directly affect the accuracy of traditional parametric models and therefore their use.
Artificial Intelligence paves the way for a fourth model of cost modeling
The advances made in algorithmic and machine learning in recent years largely solve the disadvantages of traditional parametric methods and improve their performance and their field of application.
Among the recent statistical methods, the random forest algorithm, formally proposed in 2001 by Leo Breiman and Adèle Cutler (Breiman, L., Random Forests, Machine Learning, 45, 532 (2001) is a nonparametric approach that performs learning on multiple decision trees driven on slightly different subsets of data generated by Bootstrap techniques.
1/ What are the advantages?
The main advantages of this artificial intelligence algorithm are:
 Ability to model a very large number of parameters ("cost drivers") and particularly qualitative or "symbolic" parameters
 Ability to process databases where the number of variables largely exceeds the number of observations
 Ability to identify and weight automatically the most important parameters, and thus the "cost drivers" that impact most the cost of the product
 Ability to manage missing values / incomplete databases
 Robustness to outliers
 Ability to identify behavioral breaks in variables
 Interpretation of the tree
 Precision increased by 30 to 40% compared to traditional methods
2/ What are the applications?
The applications of these algorithms are numerous, especially in the medical, insurance, marketing targeting (with uplift methods).
The application of random forests in the field of cost estimation solves many of the disadvantages of traditional parametric approaches and opens to new opportunities for companies interested in efficiency and competitiveness.
A precise estimate of costs is now possible, even with a limited number of observations (a few dozen), limiting the resources used to collect and capitalize the data. On the other hand, the price of complex systems can be modeled from easily accessible functional cost drivers, making encryption particularly simple and fast. Thus, for an equipment manufacturer, we were able to model the cost of an air conditioning system almost exclusively from functional or environmental parameters such as the volume to be airconditioned, the number of openings, the time required to reach the target temperature, etc.
For this reason, random forests have begun to be used by some companies in the early phases of the product life cycle, including:
 Gain productivity on their encryption activities (saving time and resources that they can focus on technological innovation figures)
 Respond more quickly to their clients' tenders and especially use this time saving to better optimize their proposal
 Secure and optimize their margin on new business
It is not surprising that the first users were sectors with strong encryption and product development activities (automotive, capital goods, consumer goods, etc.).
The second step was to use these algorithms to perform consistency or price inconsistency analyzes by identifying products with large discrepancies between the actual price and the estimated price. The explanatory properties of random forests (classification with similar products) make it possible to argue with suppliers during negotiations and thus to generate savings in purchases.
Finally, once the model is perfectly calibrated, it becomes a cost control tool to validate the fair price offered by the supplier. This reduces the bargaining process.
3/ What are the opportunities?
The opportunities offered by random forests in the field of cost estimation and optimization are therefore enormous and far from being fully exploited. Beyond cost optimization, the selflearning of the algorithm on the data of companies and their suppliers makes it possible to consider intelligent contributions such as the automatic preparation of negotiations (objectives, levers arguments ...), the proposing optimized designs or redesigns, recommending the most adapted purchasing strategies anticipating supplier behavior ...
In conclusion, the 2 approaches are complementary in their use:
Advantages  Limitations  Examples of software  

Analytical Model 



Statistical Parametric Model 



NonParametric Statistical Model « Random Forests » 



Conclusions
In conclusion, it would be futile to oppose the analytical and statistical methods of cost estimation. They complement each other in their use and purpose. The statistical method, which is more consistent because it is based on the observation of the actual data, makes it possible to obtain a rapid and precise evaluation to make the right decisions in the product design or redesign processes. Simple to implement, it allows to model many families of products and services in a nonintrusive way and without needing to acquire an advanced technological expertise. The analytical method allows to obtain an encryption precisely reflecting the reality (or the simulation) of a manufacturing process. More tedious to implement, on the other hand it allows to define targets of cost to be reached with explanatory factors based on the observed industrial parameters and benchmarks. In this sense, it is more appropriate to quantify technological breakthroughs and to lead industrial suppliers' progress plans to bring them to the target. It is also more relevant to quantify technological innovations on which the company does not have a history.
Nevertheless, selflearning algorithms and deep learning open new horizons and fields of application for the use of statistical models, notably through the sharing of information between companies or between them and their suppliers.