Posted by: Dawid Kopczyk, *03 Mar 2020*

The insurance companies in the European Union use **Internal Models** to assess the risk and calculate how much money they should keep to remain solvent in case of crisis. The main output of the capital modelling – **Solvency Capital Requirement** (SCR) is calculated running a huge Monte Carlo simulation.

In this blog post, you will learn two things:

- How actuaries use
**proxy models** in capital modelling and why they are connected to **Monte Carlo** method?
- How
**machine learningce** algorithms **boost the performance** of capital modeling and allow insurance companies to **save money and time-resources**

### Proxy Models

A proxy model aims to **emulate outputs of more complex model** in a simpler and resource-saving manner. In the actuarial world, the complex models are *cash-flow projection models*.

#### Cash-flow Models

The cash-flow model takes each policyholder information (such as age, health condition, type of insurance a policyholder bought, etc.) as an input and then outputs items of balance sheet (such as profit, loss, shareholders return, etc.). These figures are calculated based on some current **best estimate assumptions** of mortality, longevity or interest rates to discount cash-flows.

The inputs and outputs of cash-flow models in an insurance company

The inner workings of the cash-flow models are quite complex and actuaries spent a lot of time to build and execute them. The complexity of the cash-flow models depends on:

- types of insurance products a company sells,
- asset-liability interactions,
- and the granularity of the model inputs.

#### Monte Carlo

The key figure to calculate in insurance company is the amount of money it should keep in order to remain solvent in the next year, which is **Solvency Capital Requirement** (SCR). The way how this figure is calculated is defined by financial regulators, but for most European insurers, it assumes that the cash-flow models need to be executed many times with various scenarios of economic and non-economic assumptions. In other words, the Monte Carlo method is used – actuaries produce **thousands of balance sheets calculated on different scenarios** and this allows them to get an **empirical distribution** of insurer’s losses.

Monte Carlo simulation to determine empirical distribution of balance sheet. Each scenario goes to cash-flow model which produces a balance sheet.

Having the empirical distribution of losses, we can assess how much money insurer should keep to be safe if any extreme scenario would happen in the next year.

What if a cash-flow model execution lasts for 1 hour and we need to perform 1 million runs to get a proper Monte Carlo estimate of loss distribution? Unfortunately, the time resources required to accomplish the task __(1 million hours 114 years)__ are not available for anyone. One can decide to use cloud computing to run the models in the parallel mode, but this could be very expensive. However, there exists a better option – **we can use machine learning algorithms to deal with this problem!**

### Current Approaches

The easiest and naive algorithm is **linear regression**. Then, the proxy model approximating a target value is expressed by the polynomial:

where are determined by solving the set of above equations for all training scenarios . However, this method **turns out to be insufficient** – the assumption that a complex cash-flow model can be approximated with a linear equation is somewhat naive.

In order to mimic non-linearity, one can add **interactions**, for example of the form , , , however the number of such terms quickly increases leading to high estimator variance, numerical problems, and instability. To deal with this problem, the actuarial departments have processes to select the most important interactions using an expert judgment or automatic feature selection algorithms (such as stepwise selection algorithm).

Stepwise feature selection algorithm iteratively eliminates unnecessary terms and then selects the best model with set of most important features.

Nevertheless, the number of interaction terms might be insufficient or wrongly selected. Still, the underlying proxy model is linear in coefficients, which is often a reason why out-of-sample scores such as or are not satisfying and the proxy model could not fully mimic behavior of a cash-flow model.

Instead of using linear regression, more sophisticated machine learning algorithms can be used such as Random Forest or Neural Networks to directly take into account the non-linearity and complexity of cash-flow models. These algorithms have been implemented and fitted on one of the largest reinsurance data. The results are summarized in the next section.

More sophisticated machine learning algorithms can be used in proxy modelling problem such as random forest (a) or neural networks (b).

### Case Study

#### Algorithms

Four machine learning algorithms have been implemented:

- Linear Regression with Backward Stepwise Selection (BSS)
- Lasso
- Random Forest
- Neural Networks

The **Linear BSS algorithm is a benchmark representing** the current proxy modeling approach used in insurance companies. The metrics used for comparison between models are simply and calculated on out-of-sample data and we have not observed different results around 99.5% percentile.

#### Data

The data produced by the cash-flow model of a large reinsurance company consists of 3000 risk scenarios and 14 risks with best estimate of liabilities (BEL) as a target variable. The cash-flow model describes traditional insurance products and is characterized by a complex behavior and interactions between risks.

The cash-flow model was evaluated 3000 times to produce y values.

The data has been randomly split into a training set of size 2500 and a out-of-sample set of size 500 so that we are able to produce out-of-sample and . The interaction terms have been added for Linear BSS and Lasso estimators to incorporate non-linear effects. The maximum degree of monomials was selected to , thus the total number of features for Linear BSS and Lasso method is . For Random Forest and Neural Networks, we use the original 14 risk drivers.

#### Results

The table below summarizes the results of the exercise.

Estimator |
Training R2 |
Out-of-sample R2 |
Training MSE |
Output-of-sample MSE |

Linear BSS |
85.5% |
83.9% |
0.14 |
0.16 |

Lasso |
86.7% |
87.7% |
0.13 |
0.12 |

Random forests |
99.8% |
98.2% |
0.00 |
0.02 |

Neural networks |
99.5% |
98.6% |
0.01 |
0.01 |

Generally, the training and validation scores are close to each other which indicates that the models are **not overfitted**. Analyzing out-of-sample scores we see that **linear models fail to fully mimic the complexity of cash-flow models**, whereas **random forests and artificial neural networks fit almost perfectly** to the data with over 98% out-of-sample . It means that these proxy models will generalize well for all scenarios, producing a distribution of best estimate liabilities very close to what cash-flow models would produce if we have 143 years to execute them.

#### Hyperparameters Tuning

For lasso, random forest and neural networks, we perform grid search hyperparameter optimization considering all combinations of predefined values for the most important hyperparameters and select the best subset based on 4-fold cross-validated .

Hyperoptimization results for given hyperparameters in lasso, random forest and neural networks. The best hyperparameters have been selected to main training procedure.

### Conclusions

Traditionally, the proxy modeling problem is solved by a polynomial, which mimics the complexities of a cash-flow model. We propose more sophisticated machine learning models, such as Random Forest and Neural Networks to boost proxy model performance. For above-mentioned example conducted on data coming from the large reinsurance company, we successfully showed that **machine learning is very useful** in this kind of problems.

In terms of **explainablity**, actuarial departments can use Explainable AI to remove the *black-box-ness* to meet regulator requirements.

Using data science and artiﬁcial intelligence approaches in insurance companies is an interesting and fruitful research direction that can contribute to huge savings and increase in the quality of risk management. In case you are interested how our Actuarial Data Science Platform can help you implement AI in your actuarial department leave us a message.