Algorithms and Artificial Intelligence: New Horizons for Cost Estimation and Modeling
Insight 27 Sep. 2017

Algorithms and Artificial Intelligence: New Horizons for Cost Estimation and Modeling

The advances made over the past 20 years in the field of statistics have made it possible to develop predictive algorithms that are much more efficient, particularly in terms of precision. What possible applications in the field of cost estimation and modeling?

While traditional analytical models based on the manufacturing processes of the product or service are still widely used in our Cartesian society, statistical models are gradually establishing themselves due to their formidable efficiency. But rather than an opposition, these 2 methods enrich and complement each other.

Traditional Costing Models

As a reminder, there are currently 3 main methods used to estimate the cost of a product:

1 / The Analog Method

This method estimates the cost of a new product by comparison with similar products produced or purchased in the past. This method is unreliable, but can be used in extremely upstream phases (feasibility study) when the characteristics of the project or service are not yet known. We will not dwell on this type of basic estimate in this article.

2 / The Analytical Method

It estimates the cost of a product by modeling the industrial production process. This method is based on the cost structure of the product, of which it estimates each intermediate element on the basis of the materials & components involved, the process costs (machine and labor), as well as the additional structural costs.

This method has several advantages:

  • It makes it possible to estimate an optimized and theoretical production cost by modeling a virtual factory on the basis of the best ratios (cost of labor, TRS, Scraps, etc.).
  • It thus makes it possible to set an ambitious cost target and identify the “Best Landed Cost” for a given product.
  • It also makes it possible to identify in a concrete manner the sources of non-performance of suppliers (on which stage of the process, which cost item, which indicator, etc.) and to initiate with them a continuous improvement process to capture cost savings. productivity.
  • This method is therefore particularly useful in the downstream phases of the life cycle (production, continuous improvement, product redesign, etc.).

However, the analytical method has some drawbacks or brakes to implementation:

  • It requires a good understanding of the manufacturing processes involved as well as key parameters (TRS, Scraps, cycle time, etc.). This information is not always easy to collect and capitalize on from suppliers.
  • Determining the “Best Landed Cost” requires supplying these tools with benchmark data on production parameters, and keeping these benchmarks up to date
  • While standard processes can be modeled more or less quickly (injection, extrusion, foundry, cutting, stamping, surface treatment, etc.), costing a complex product is often tedious. It requires advanced expertise that only a few people master in the company.
  • As a result, costing units quickly experience bottlenecks, with processing times incompatible with agile development and “time to market” constraints.
  • Finally, if these models have a real relevance for giving cost targets, they often lack precision, because they do not take into account the hazards or certain external factors (balance of power, market effects, etc.). all the more so as many suppliers have a very low level of maturity in controlling their industrial cost price (PRI).

The software solutions existing on the market respond to some of these problems by offering, in particular, integrated repositories for several manufacturing processes with benchmark data by country. Some publishers have also developed interfaces offering reading of CAD files, which makes it possible to automate the proposal of manufacturing processes (virtual factory). However, this software remains heavy and takes a long time to configure and is used by some experts.

3. The Parametric Method

This method estimates the cost of a product or service by statistical modeling. This method uses the histories of similar products or services to define equations or statistical laws which make it possible to model the evolution of the cost as a function of certain parameters known as “cost drivers”.

These models are most of the time based on linear, multilinear, polynomial or logarithmic regressions. These estimation methods have several advantages:

  • They make it possible to estimate the cost of a new product / service on the basis of simple characteristics known to the company (weight, size, volumes, country of production, key elements of the specification, etc.) without necessarily knowing the details of the manufacturing process or external benchmarks. It is therefore a very quick and easy method to implement.
  • On the other hand, being based on the observation of the products / services actually manufactured or purchased in the past, the estimated cost is potentially more consistent and precise than a “theoretical” analytical model, on condition of course to have enough quality history.
  • These statistical methods are particularly useful in the upstream phases of the life cycle (opportunity, feasibility, detailed design, etc.) because they make it possible to quickly take the right decisions for an optimized design and therefore to secure the margin while accelerating the “time to market”. “.
  • Further downstream, they also allow rapid analysis of consistency or inconsistencies in current prices, thanks to dispersion analyzes against the predictive model. Thus they reveal aberrant products or services, at an abnormally high cost, for example, with regard to the predictive model. This provides avenues for optimization for buyers (renegotiation, change of supplier) or for R&D (redesign).

However, these methods have several limitations:

  • Traditional statistical models (based on regressions) hardly take qualitative parameters into account (except when reducing the size of the database).
  • They handle missing data poorly and therefore require very clean databases.
  • They poorly manage “breaks” or threshold effects. For example, the price can have a linear behavior over a certain range, then a radically different behavior from a certain threshold (size, weight, volume…) because the manufacturing process can change.
  • All these elements directly affect the precision of traditional parametric models and therefore their use.

Artificial Intelligence, the Path to a Fourth Method of Cost Modeling

The progress made in recent years in the field of algorithms and machine learning largely solve the drawbacks of traditional parametric methods and improve their performance and their field of application.

Among recent statistical methods, the “random forests” algorithm, formally proposed in 2001 by Leo Breiman and Adèle Cutler (Breiman, L., Random Forests. Machine Learning. 45, 5-32 (2001)) is a statistical method. nonparametric which performs training on multiple decision trees trained on slightly different data subsets, generated by “Bootstrap” techniques.

1) What are the Advantages?

The main advantages of this artificial intelligence algorithm are as follows:

  • Ability to model a very large number of parameters (“cost drivers”) and in particular qualitative or “symbolic” parameters
  • Ability to process databases for which the number of variables greatly exceeds the number of observations
  • Ability to automatically identify and weight the most important parameters, and therefore the “cost drivers” that have the greatest impact on the cost of the product
  • Ability to manage missing values ​​/ incomplete databases
  • Robustness to the “outliers”
  • Ability to identify breaks in the behaviour of variables
  • Interpretability of the tree
  • Accuracy increased by 30 to 40% compared to traditional methods

2) What are the Applications?

There are many applications of these algorithms, particularly in the medical and insurance fields, or even in the targeting of marketing campaigns (with uplift methods).

The application of random forests in the field of cost estimation solves many of the drawbacks of traditional parametric approaches and thus opens up new opportunities for companies wishing to gain in efficiency and competitiveness.

In fact, a precise cost estimate is now possible, even with a limited number of observations (a few dozen), thus limiting the resources committed to collect and capitalize the data. On the other hand, the price of complex systems can be modeled on the basis of easily accessible functional “cost drivers” making costing particularly simple and fast. Thus, for a manufacturer of equipment, we were able to model the cost of an air conditioning system almost exclusively from functional or environmental parameters such as the volume to be air conditioned, the number of openings, the number of people, the time required to reach the target temperature, etc.

This is why random forests have started to be used by some companies in the upstream phases of the product life cycle, and in particular for:

  • Gain productivity in their costing activities (saving time and resources that they can focus on technological innovation costings)
  • Respond more quickly to their client’s calls for tenders and above all use this time saving to better optimize their proposal
  • Secure and optimize their margins on new businesses

It is not surprising to see that the first users were sectors with strong costing and product development activities (automotive, capital goods, consumer goods …).

The second step was then to use these algorithms to perform price consistency or inconsistency analyzes by identifying products with large differences between the actual price and the estimated price. The explanatory properties of random forests (classification with similar products) make it possible to argue with suppliers during negotiations and thus to generate savings in purchases.

Finally, once the model is perfectly calibrated, it becomes a “cost control” tool to validate the fair price offered by the supplier. Negotiation processes are made easier.

3) What are the Opportunities?

The opportunities offered by random forests in the area of ​​cost estimation and optimization are therefore enormous and far from having been fully exploited. Beyond cost optimization, self-learning of the algorithm on the data of companies and their suppliers makes it possible to consider intelligent contributions such as the automatic preparation of negotiations (objectives, arguments levers, etc.), proposal of optimized designs or redesigns, recommendation of the most appropriate purchasing strategies anticipating supplier behavior, etc.

Two Complementary Approaches in Their Use

Analytical Model

  • Explanatory and operations-centric model
  • “Best Landed Cost” estimate and target price definition
  • Allows the optimization of production prices and the management of supplier progress plans
  • Difficulty of access to process repositories and maintenance over time
  • Intrusive approach vis-à-vis suppliers
  • Expert model hardly diffusable
  • Time to set up and complete costings
  • Precision?
Software Examples
  • Siemens PLM
  • Apriori
  • Facton


Statistical Parametric Model

  • Ease and speed of use
  • Consistency of the estimated price, and precision (conditional)
  • Non-intrusive approach to suppliers
  • Product and service application
  • Very relevant in the upstream phases of the life cycle and for consistency analyzes
  • Requires minimal data history and quality
  • Model not very “explanatory” to animate supplier progress plans
  • Less relevant model for defining target prices and “Best Landed Cost”
  • Difficulty in modeling qualitative parameters
Software Examples
  • Seer
  • EstimFEC

Non-Parametric Statistical Model

“Random forests”

  • Ease and speed of use
  • Consistency of the estimated price, and accuracy increased by 30% compared to parametric models (under condition)
  • Non-intrusive approach to suppliers
  • Product and service application
  • Very relevant in the upstream phases of the life cycle
  • Also relevant in the downstream phases for the analysis of price consistency and the identification of opportunities thanks to the explanatory properties of forests
  • Integrates a large number of cost drivers, including qualitative ones
  • Detects technological breakthroughs
  • Prioritize cost drivers
  • Handles missing values ​​and can work with a limited sample
  • Less relevant model for defining target prices and “Best Landed Cost”
Software Examples
  • easyKost


It would be futile to want to oppose analytical and statistical methods of cost estimation. They complement each other in their use and purpose.

The statistical method, which is more consistent because it is based on the observation of real data, makes it possible to obtain a rapid and precise assessment in order to make the right decisions in the product design or redesign processes. Simple to implement, it allows a large number of families of products and services to be modeled in a non-intrusive manner and without the need to acquire advanced technological expertise.

The analytical method makes it possible to obtain an estimate precisely reflecting the reality (or the simulation) of a manufacturing process. More tedious to implement, it does, however, make it possible to define cost targets to be reached with explanatory factors based on the observed industrial parameters and benchmarks. In this sense, it is more appropriate for quantifying technological breakthroughs and leading supplier industrial progress plans in order to bring them to the target. It is also more relevant for quantifying technological innovations for which the company does not have a history.

Nevertheless, self-study algorithms and deep learning open up new horizons and fields of application for the use of statistical models, in particular through the sharing of information between companies or between them and their suppliers.