# Simulation and Optimization¶

We’ll go through the entire simulation and optimization process in this tutorial. The general procedure on optimizing the portfolio is as follows:

1. Determine the asset classes (and thus the dataset used)

2. Simulate the returns

3. Run an optimization program on the simulated cube (tensor)

More details on each section will be covered.

## Getting Started¶

To start off, install the required packages by running the following commands in your command prompt.

Feel free to name my_env with whatever you like.

conda config --prepend channels conda-forge
conda config --append channels bashtage
conda config --append channels danielbok

conda create -y -n my_env python=3.7 muarch copulae allopy pandas

# you may need to init if you're using conda for the first time
conda init --all

conda activate my_env


Remember to activate the environment.

## Simulation¶

For the simuluation The simulation follows the following procedure:

1. Load the returns dataset of the asset classes you want to simulate

2. For each asset class, specify the AR-GARCH model

3. Fit the AR-GARCH models with the log-returns data

4. Fit the standardized residuals of the AR-GARCH models to a Student Copula

5. Overwrite the correlation matrix of the Student Copula with the log-returns correlation

6. Simulate future returns using the AR-GARCH models with distributions “inversed” from the copula

7. Truncate then calibrate the first 2 moments (returns and vol) of the cube

### Exploring the Data¶

Let’s start by loading and exploring the sample policy index dataset. These are the monthly returns for the 7 assets we will simulate. The data ranges from 01 Jan 1985 to 01 Oct 2017.

[1]:

import numpy as np

log_returns = (returns + 1).apply(np.log)
_, num_assets = log_returns.shape


[1]:

CASH NB EILB DMEQ EMEQ PE RE
DATE
1985-01-01 0.004109 0.008796 0.028741 0.058068 -0.014185 0.067508 -0.000949
1985-01-02 0.024052 -0.002144 -0.052592 0.008257 -0.041216 0.007606 -0.018895
1985-01-03 -0.030244 0.006433 -0.000043 -0.013792 0.017271 0.028742 -0.014230
1985-01-04 0.002630 0.005479 0.020288 -0.083561 -0.005833 -0.016684 -0.022006
1985-01-05 0.007554 0.042372 0.046989 0.052969 -0.008937 0.071804 0.035114
[2]:

log_returns.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 394 entries, 1985-01-01 to 2017-01-10
Data columns (total 7 columns):
CASH    394 non-null float64
NB      394 non-null float64
EILB    394 non-null float64
DMEQ    394 non-null float64
EMEQ    394 non-null float64
PE      394 non-null float64
RE      394 non-null float64
dtypes: float64(7)
memory usage: 24.6 KB


### Forming the Models¶

By right, we are supposed to form a distinct AR-GARCH model for each asset class. However, we’ll take some short cuts by assuming every model is an AR(1)-GARCH(1,1)-Skew_T model with the exception of CASH. This exception is meant to serve as an example of how to change each model separately.

[3]:

from muarch import MUArch, UArch

# The settings defined will set the default for all models.
arch_models = MUArch(num_assets,
mean='ARX',
lags=1,
vol='GARCH',
p=1,
q=1,
power=2.0,
dist='skewt',
scale=100)

# You can specify each model separately. We'll use CASH as an example
# CASH is the model in the MUArch object
arch_models[0] = UArch(mean='CONSTANT',
lags=0,
vol='CONSTANT',
dist='skewt',
scale=100)


### Fitting the Model¶

We can call the fit function and subsequently check the quality of each fit through the .summary() method.

[4]:

arch_models.fit(log_returns)
arch_models.summary()

[4]:


## CASH

Dep. Variable: R-squared: y -0 Constant Mean -0 Constant Variance -734.606 Standardized Skew Student's t 1477.21 Maximum Likelihood 1493.12 394 Sun, Jan 12 2020 390 11:54:38 4
coef std err t P>|t| 95.0% Conf. Int. -4.6322e-03 8.148e-02 -5.685e-02 0.955 [ -0.164, 0.155]
coef std err t P>|t| 95.0% Conf. Int. 2.6251 0.274 9.575 1.019e-21 [ 2.088, 3.162]
coef std err t P>|t| 95.0% Conf. Int. 5.4964 1.387 3.964 7.368e-05 [ 2.779, 8.214] -6.7665e-04 6.797e-02 -9.955e-03 0.992 [ -0.134, 0.133]

Covariance estimator: robust

## NB

Dep. Variable: R-squared: y 0.009 AR 0.006 GARCH -651.331 Standardized Skew Student's t 1316.66 Maximum Likelihood 1344.48 393 Sun, Jan 12 2020 386 11:54:38 7
coef std err t P>|t| 95.0% Conf. Int. 0.1950 7.099e-02 2.746 6.029e-03 [5.582e-02, 0.334] 0.0896 5.778e-02 1.550 0.121 [-2.368e-02, 0.203]
coef std err t P>|t| 95.0% Conf. Int. 0.0415 8.911e-02 0.465 0.642 [ -0.133, 0.216] 0.0000 4.415e-02 0.000 1.000 [-8.653e-02,8.653e-02] 0.9723 0.100 9.698 3.080e-22 [ 0.776, 1.169]
coef std err t P>|t| 95.0% Conf. Int. 8.2346 3.400 2.422 1.543e-02 [ 1.571, 14.898] -0.0813 0.105 -0.774 0.439 [ -0.287, 0.125]

Covariance estimator: robust

## EILB

Dep. Variable: R-squared: y 0.002 AR -0.001 GARCH -1024.23 Standardized Skew Student's t 2062.46 Maximum Likelihood 2090.27 393 Sun, Jan 12 2020 386 11:54:39 7
coef std err t P>|t| 95.0% Conf. Int. 0.7238 0.177 4.081 4.477e-05 [ 0.376, 1.071] 0.0202 5.402e-02 0.375 0.708 [-8.565e-02, 0.126]
coef std err t P>|t| 95.0% Conf. Int. 7.2805 1.601 4.549 5.402e-06 [ 4.143, 10.418] 0.1777 0.109 1.634 0.102 [-3.549e-02, 0.391] 0.1816 0.153 1.187 0.235 [ -0.118, 0.481]
coef std err t P>|t| 95.0% Conf. Int. 8.8789 3.717 2.389 1.691e-02 [ 1.594, 16.164] -0.0437 6.861e-02 -0.636 0.524 [ -0.178,9.080e-02]

Covariance estimator: robust

## DMEQ

Dep. Variable: R-squared: y -0.004 AR -0.006 GARCH -1142.18 Standardized Skew Student's t 2298.36 Maximum Likelihood 2326.18 393 Sun, Jan 12 2020 386 11:54:39 7
coef std err t P>|t| 95.0% Conf. Int. 0.3412 0.219 1.558 0.119 [-8.804e-02, 0.770] -0.0145 5.746e-02 -0.252 0.801 [ -0.127,9.814e-02]
coef std err t P>|t| 95.0% Conf. Int. 1.3369 0.568 2.354 1.860e-02 [ 0.224, 2.450] 0.0762 3.183e-02 2.393 1.672e-02 [1.378e-02, 0.139] 0.8594 4.086e-02 21.032 3.314e-98 [ 0.779, 0.939]
coef std err t P>|t| 95.0% Conf. Int. 9.6054 4.727 2.032 4.215e-02 [ 0.341, 18.870] -0.2163 7.351e-02 -2.943 3.255e-03 [ -0.360,-7.224e-02]

Covariance estimator: robust

## EMEQ

Dep. Variable: R-squared: y 0.019 AR 0.016 GARCH -1309.97 Standardized Skew Student's t 2633.94 Maximum Likelihood 2661.75 393 Sun, Jan 12 2020 386 11:54:39 7
coef std err t P>|t| 95.0% Conf. Int. 0.9325 0.357 2.609 9.080e-03 [ 0.232, 1.633] 0.1289 4.625e-02 2.787 5.317e-03 [3.826e-02, 0.220]
coef std err t P>|t| 95.0% Conf. Int. 1.6863 1.348 1.251 0.211 [ -0.956, 4.329] 0.0322 1.742e-02 1.846 6.490e-02 [-1.985e-03,6.629e-02] 0.9367 3.036e-02 30.856 4.717e-209 [ 0.877, 0.996]
coef std err t P>|t| 95.0% Conf. Int. 6.0618 1.870 3.242 1.186e-03 [ 2.397, 9.726] -0.2297 7.166e-02 -3.205 1.349e-03 [ -0.370,-8.925e-02]

Covariance estimator: robust

## PE

Dep. Variable: R-squared: y 0.028 AR 0.026 GARCH -1180.08 Standardized Skew Student's t 2374.16 Maximum Likelihood 2401.98 393 Sun, Jan 12 2020 386 11:54:39 7
coef std err t P>|t| 95.0% Conf. Int. 0.4321 0.246 1.760 7.845e-02 [-4.916e-02, 0.913] 0.1118 5.112e-02 2.187 2.872e-02 [1.162e-02, 0.212]
coef std err t P>|t| 95.0% Conf. Int. 0.9024 0.581 1.552 0.121 [ -0.237, 2.042] 0.0562 1.982e-02 2.834 4.595e-03 [1.733e-02,9.502e-02] 0.9098 2.973e-02 30.603 1.105e-205 [ 0.852, 0.968]
coef std err t P>|t| 95.0% Conf. Int. 8.5291 3.344 2.550 1.076e-02 [ 1.974, 15.084] -0.2765 8.300e-02 -3.332 8.626e-04 [ -0.439, -0.114]

Covariance estimator: robust

## RE

Dep. Variable: R-squared: y 0.01 AR 0.007 GARCH -891.715 Standardized Skew Student's t 1797.43 Maximum Likelihood 1825.25 393 Sun, Jan 12 2020 386 11:54:39 7
coef std err t P>|t| 95.0% Conf. Int. 0.1584 0.127 1.251 0.211 [-8.968e-02, 0.407] 0.0813 5.794e-02 1.404 0.160 [-3.221e-02, 0.195]
coef std err t P>|t| 95.0% Conf. Int. 1.0374 0.783 1.325 0.185 [ -0.497, 2.572] 0.0937 4.863e-02 1.926 5.409e-02 [-1.645e-03, 0.189] 0.7221 0.164 4.391 1.129e-05 [ 0.400, 1.044]
coef std err t P>|t| 95.0% Conf. Int. 15.0314 8.833 1.702 8.880e-02 [ -2.281, 32.344] -6.9447e-03 8.766e-02 -7.922e-02 0.937 [ -0.179, 0.165]

Covariance estimator: robust

### Fitting the Copula¶

The next thing we need to do is to fit the copula with the standardized residuals of the arch models’ fits. We can also check the fit of the copula with the .summary() method.

[5]:

from copulae import TCopula

residuals = arch_models.residuals()  # defaults to standardized residuals
cop = TCopula(num_assets)
cop.fit(residuals)
cop.summary()

[5]:


## Student Copula Summary

Student Copula with 7 dimensions

### Parameters

Degree of Freedom23.07912545920342
Correlation Matrix
 1 0.132323 -0.383796 0.088988 0.095665 0.069652 0.233252 0.132323 1 0.399767 -0.019967 -0.072441 -0.04885 0.178103 -0.383796 0.399767 1 0.014854 0.038 0.070871 0.015642 0.088988 -0.019967 0.014854 1 0.506254 0.699188 0.467597 0.095665 -0.072441 0.038 0.506254 1 0.579042 0.317568 0.069652 -0.04885 0.070871 0.699188 0.579042 1 0.442605 0.233252 0.178103 0.015642 0.467597 0.317568 0.442605 1

### Fit Summary

Fit Summary
Log Likelihood-388.78612323077846
Variance EstimateNot Implemented Yet
MethodMaximum pseudo-likelihood
Data Points393

Optimization SetupResults
bounds[(0.0, inf), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0), (-1.0, 1.0)]x[ 2.30791255e+01 1.32323371e-01 -3.83796066e-01 8.89879525e-02 9.56646977e-02 6.96515171e-02 2.33251735e-01 3.99767177e-01 -1.99674055e-02 -7.24414193e-02 -4.88504786e-02 1.78102666e-01 1.48544670e-02 3.79998706e-02 7.08714854e-02 1.56419634e-02 5.06253610e-01 6.99188239e-01 4.67596725e-01 5.79042392e-01 3.17568429e-01 4.42605381e-01]
options{'maxiter': 20000, 'ftol': 1e-06, 'iprint': 1, 'disp': False, 'eps': 1.5e-08}fun-388.78612323077846
methodSLSQPjac[-0.00053054 -0.03081292 0.02364686 0.04192392 -0.02807307 -0.03360583 0.00718501 -0.03806235 -0.05757859 0.0176783 0.03008912 -0.00658247 0.07304 -0.01542351 -0.05668426 0.00757912 -0.00902673 0.01783368 0.01313841 -0.02499595 -0.02411677 0.02380602]
NoneNonenit35
NoneNonenfev907
NoneNonenjev35
NoneNonestatus0
NoneNonemessageOptimization terminated successfully.
NoneNonesuccessTrue

All elliptical copula comes with a correlation matrix. In this case, we would like to overwrite the correlation matrix of the TCopula with the correlation matrix of the log returns.

[6]:

np.set_printoptions(linewidth=160)
cop.sigma

[6]:

array([[ 1.        ,  0.13232337, -0.38379607,  0.08898795,  0.0956647 ,  0.06965152,  0.23325173],
[ 0.13232337,  1.        ,  0.39976718, -0.01996741, -0.07244142, -0.04885048,  0.17810267],
[-0.38379607,  0.39976718,  1.        ,  0.01485447,  0.03799987,  0.07087149,  0.01564196],
[ 0.08898795, -0.01996741,  0.01485447,  1.        ,  0.50625361,  0.69918824,  0.46759673],
[ 0.0956647 , -0.07244142,  0.03799987,  0.50625361,  1.        ,  0.57904239,  0.31756843],
[ 0.06965152, -0.04885048,  0.07087149,  0.69918824,  0.57904239,  1.        ,  0.44260538],
[ 0.23325173,  0.17810267,  0.01564196,  0.46759673,  0.31756843,  0.44260538,  1.        ]])

[7]:

# this is how you overwrite the correlation matrix

cop[:] = log_returns.corr()
cop.sigma

[7]:

array([[ 1.        ,  0.13569432, -0.35933974,  0.10954112,  0.1281724 ,  0.09676939,  0.24841052],
[ 0.13569432,  1.        ,  0.34423657, -0.01144135, -0.11482461, -0.06096223,  0.17947296],
[-0.35933974,  0.34423657,  1.        ,  0.0869657 ,  0.05494202,  0.11039716,  0.03943829],
[ 0.10954112, -0.01144135,  0.0869657 ,  1.        ,  0.59790511,  0.76230971,  0.49290886],
[ 0.1281724 , -0.11482461,  0.05494202,  0.59790511,  1.        ,  0.67644088,  0.3457239 ],
[ 0.09676939, -0.06096223,  0.11039716,  0.76230971,  0.67644088,  1.        ,  0.43281785],
[ 0.24841052,  0.17947296,  0.03943829,  0.49290886,  0.3457239 ,  0.43281785,  1.        ]])


### Simulating Future returns¶

Now that we have the copula to generate a dependency structure for each of the univariate GARCH models, we can start simulating future returns. The simulated returns data will be a 3-dimensional cube with axis time, trials and asset class respectively.

For 10000 trials and 120 periods (10 years), this will usually take some time, so give it a while. For demonstration purposes, we’ll be simulating a (120, 1000, 7) cube.

Note that the order is very important in the subsequent optimization procedure, so don’t reshape it!

[8]:

simulated_log_returns = arch_models.simulate_mc(120, 1000, custom_dist=cop.random)
cube = np.exp(simulated_log_returns) - 1  # recover as returns, instead of log returns


### Removing Outliers¶

For each asset class, we define outliers as a data point that is more than 6 standard deviations away from the asset class’s mean. We replace these outliers with the asset class mean.

As an aside, there probably is a better approach for multi-variate outlier detection and replacement

[9]:

from muarch.calibrate import truncate_outliers

cube = truncate_outliers(cube, 6)


### Calibrating the Moments of the Cube¶

After removing outliers in the cube, the next step is to calibrate the first 2 moments of the cube according to the forward looking assumptions that you have. The listing below shows the function to calculate the annualized mean and volatility for the data cube.

[10]:

import pandas as pd

def show_moments(data):
t, n, a = data.shape

df = pd.DataFrame({
'Asset': returns.columns,
'Mean (%)': ((cube + 1).prod(0) ** 0.1).mean(0) - 1,
'Vol (%)': ((cube + 1).reshape(t // 12, 12, n, a).prod(1) - 1).std(0).mean(0)
})

df.iloc[:, 1:] = df.iloc[:, 1:].apply(lambda x: round(100 * x, 4))
return df

[11]:

# before calibration

show_moments(cube)

[11]:

Asset Mean (%) Vol (%)
0 CASH -0.0091 5.0712
1 NB 2.6050 4.2896
2 EILB 9.3393 11.9668
3 DMEQ 4.2744 14.8196
4 EMEQ 14.3037 30.2308
5 PE 6.5389 19.0356
6 RE 2.1290 8.2400

For expedience, the calibration code is shown below.

from scipy import optimize as opt

def calibrate_data(data,
mean = None,
sd = None,
time_unit=12):
data = data.copy()
y, n, num_assets = data.shape
y //= time_unit

def set_target(target_values, typ, default):
if target_values is None:
return default, [0 if typ == 'mean' else 1] * num_assets

best_guess = []
target_values = np.asarray(target_values)
assert num_assets == len(target_values) == len(
default), "vector length must be equal to number of assets in data cube"
for i, v in enumerate(target_values):
if v is None:
target_values[i] = default[i]
best_guess.append(0 if typ == 'mean' else 1)
else:
best_guess.append(target_values[i] - default[i] if typ == 'mean' else default[i] / target_values[i])
return target_values, best_guess

d = (data + 1).prod(0)
default_mean = (np.sign(d) * np.abs(d) ** (1 / y)).mean(0) - 1
default_vol = ((data + 1).reshape(y, time_unit, n, num_assets).prod(1) - 1).std(1).mean(0)

target_means, guess_mean = set_target(mean, 'mean', default_mean)
target_vols, guess_vol = set_target(sd, 'vol', default_vol)

sol = np.asarray([opt.root(
fun=_asset_moments,
x0=[gv, gm],
args=(data[..., i], tv, tm, time_unit)
).x for i, tv, tm, gv, gm in zip(range(num_assets), target_vols, target_means, guess_vol, guess_mean)])

for i in range(num_assets):
if sol[i, 0] < 0 or False:
# negative vol adjustments would alter the correlation between asset classes (flip signs)
# in such instances, we fall back to using the a simple affine transform where
# R' = (tv/cv) * r  # adjust vol first
# R' = (tm - mean(R'))  # adjust mean

tv = default_vol[i]
cv = sd[i] if sd[i] is not None else tv
data[..., i] *= tv / cv  # tv / cv

tm, cm = target_means[i], data[..., i].mean()
data[..., i] += (tm - cm)  # (tm - mean(R'))
else:
data[..., i] = data[..., i] * sol[i, 0] + sol[i, 1]
return data

def _asset_moments(x: np.ndarray, asset: np.ndarray, t_vol: float, t_mean: float, time_unit: int):
t, n = asset.shape  # time step (month), trials
y = t // time_unit

calibrated = asset * x[0] + x[1]

d = (calibrated + 1).prod(0)
mean = (np.sign(d) * np.abs(d) ** (1 / y)).mean() - t_mean - 1
vol = ((calibrated.reshape((y, time_unit, n)) + 1).prod(1) - 1).std(1).mean(0) - t_vol

return vol, mean

[12]:

# calibration

from muarch.calibrate import calibrate_data

target_mean = [0, 0.03, 0.06, 0.08, 0.11, 0.12, 0.05]
target_vol =  [0.055, 0.073, 0.14, 0.09, 0.22, 0.18, 0.1]

cube = calibrate_data(cube, target_mean, target_vol)

show_moments(cube)

[12]:

Asset Mean (%) Vol (%)
0 CASH 0.0 5.0818
1 NB 3.0 6.7131
2 EILB 6.0 12.8631
3 DMEQ 8.0 8.1705
4 EMEQ 11.0 20.1085
5 PE 12.0 16.1869
6 RE 5.0 9.1215

## Optimization¶

The data used for the optimization is the truncated and calibrated cube from the simulation step before. We have 2 optimizers, the BaseOptimizer and the PortfolioOptimizer.

The PortfolioOptimizer is built on top of the BaseOptimizer. So if you need to do common optimization operations, use the PortfolioOptimizer as it probably already has the routines. If you need to work from scratch on something custom, use the BaseOptimizer.

We’ll explore the BaseOptimizer first to understand how to code out a Maximize Expected Returns subject to CVaR and Volatility constraints.

\begin{split}\underset{w_{1, \dots, A}}{\max} \frac{1}{N}\sum^N_i \prod^T_i \sum^A_k(\textbf{C}_{i, j, k} w_k + 1) \\ s.t. \\ \begin{align*} \text{Vol}(\textbf{C}, \textbf{w}) &\leq 0.1 \\ \text{CVaR}(\textbf{C}, \textbf{w}) &\geq -0.33 \\ 0 \leq w_k \leq 1 \quad &\forall \quad k \in 1, \dots, A \end{align*}\end{split}

An important thing to note is that while we simulated monthly data, for optimization, we use quarterly data. Here’s a fast way to convert monthly to quarterly data.

[13]:

t, n, _ = cube.shape

q_cube = (cube + 1).reshape(t // 3, 3, n, num_assets).prod(1) - 1


Let’s start with the BaseOptimizer

[14]:

from allopy import BaseOptimizer

opt = BaseOptimizer(num_assets)

bounds = np.array([
(0, 0.05),
(0, 0.4),
(0, 0.1),
(0, 0.35),
(0, 0.2),
(0, 0.15),
(0, 0.15),
])

opt.set_bounds(bounds[:, 0], bounds[:, 1])
max_vol = 0.1
max_cvar = -0.33

def obj_fun(w):
# Maximize returns assuming there's rebalancing every quarter
return ((q_cube @ w) + 1).prod(0).mean() - 1

def vol_c_fun(w):
years = len(q_cube) // 4
annual_data = (q_cube @ w + 1).reshape(years, 4, n).prod(1) - 1
annual_vols = q_cube.std(0)
return annual_vols.mean() - max_vol

def cvar_c_fun(w):
# cvar usually calculated for 3 years
returns = ((q_cube[:3 * 12] @ w) + 1).prod(0) - 1
cutoff = np.percentile(returns, 5)  # get the 5%
cvar = returns[returns <= cutoff].mean()

return max_cvar - cvar

opt.set_max_objective(obj_fun)

weights = opt.optimize()

bopt = pd.DataFrame({
'Asset': returns.columns,
'Weights': np.round(weights * 100, 2)
})

bopt

[14]:

Asset Weights
0 CASH 0.0
1 NB 5.0
2 EILB 10.0
3 DMEQ 35.0
4 EMEQ 20.0
5 PE 15.0
6 RE 15.0

Using the PortfolioOptimizer.

[15]:

from allopy import PortfolioOptimizer, OptData

main_data = OptData(q_cube, time_unit='quarterly')
opt = PortfolioOptimizer(main_data, cvar_data=main_data[:12])
opt.set_bounds(bounds[:, 0], bounds[:, 1])
weights = opt.maximize_returns(max_vol=max_vol, max_cvar=max_cvar)

aopt = pd.DataFrame({
'Asset': returns.columns,
'Weights': np.round(weights * 100, 2)
})

aopt

[15]:

Asset Weights
0 CASH 0.00
1 NB 5.34
2 EILB 10.00
3 DMEQ 35.00
4 EMEQ 20.00
5 PE 15.00
6 RE 14.66