When I introduced the momentum system on this site I did not use any form of optimization – my parameters were based on my understanding of what the system was measuring, how the markets moved and changed behavior – including correlations between asset classes, what the strengths and weaknesses of the system might be and hence, how we might accommodate a variety of different assets and different market conditions assuming we might have less reason to worry about lower returns with low risk than look for potentially higher returns with associated higher risk. I claimed (without any proof/backup) that these parameters should be “robust”.
Somehow, (I can’t imagine why 🙂 ) it was difficult for members to accept my unproven claim of robustness and so I got talked into trying to provide some proof of this and we began this series of tests thanks to help from Ernie and Herb who have provided the tools for more rigorous automated back-testing.
As we have stated many times, optimizing parameters used in investment strategies is a dangerous exercise and often leads to systems that disappoint in the longer term going forward. This is why we spend so much time emphasizing the importance of robustness and playing down the need for absolute optimization – we need some room for the inevitable times when our “optimal” parameters are not optimal.
The classical way of testing any system is to test the system using Out-of-Sample data i.e. using different assets and different time frames from the initial data set used to test the system. This is particularly important if parameters have been optimized using a specific set of historical data.
Although we have tried to be relatively unbiased in that we have consistently used the Rutherford portfolio asset list as our diversified benchmark representation of global market asset classes, we have inevitably migrated towards an optimization of parameters for this specific portfolio over the period 2006-present for which historical data is available. In Part 3 of the study we attempted to simulate Out-of-Sample tests by randomly shuffling sections of the raw data to generate pseudo price profiles for the assets – and this provides a reasonable alternative to using real Out-of-Sample data when available data records are short. However, it is not ideal.
In this part 6 of the study we move to a new portfolio of assets that have records going back ~20 years. Since ETFs were not “invented” 20 years ago we use Mutual Fund data, many of which Funds are now essentially duplicated by “equivalent” ETFs.
The assets chosen for the study were:
VTSMX – Vanguard Total Stock Market Index
VTMGX – Vanguard Developed Markets Index
VEIEX – Vanguard Emerging Markets Index
VBLTX – Vanguard Long-Term Bond Index
VBIIX – Vanguard Intermediate-Term Bond Index
PYGFX – Payden Global Fixed Income
VGSIX – Vanguard REIT Index
FIRAX – Fidelity Advisor International Real Estate
PSPFX – US Global Investors Global Resources (Commodities)
VGPMX – Vanguard Precious Metals and Mining
VFISX – Vanguard Short-Term Treasury
These assets essentially represent all the major asset classes as do the Rutherford assets. The primary difference is the inclusion of an intermediate-term US Bond Fund (VBIIX) rather than the inflation-protection ETF (TIP) included in the Rutherford. Our advantage here is that we have data back to at least 1994 for most (not quite all) assets. We can therefore run a 20-year test that includes both the 2000 bear market following the tech bubble and the 2008 bear market following the financial crisis.
Let’s first split the 20 year time frame into two 10 year periods – January 1995 to December 2004 and January 2005 to Present (August 2015) – and look at the performance curves using the reference “Kipling” model with ROC1 = 60 trading days (87 calendar days), ROC2 = 100 trading days (145 calendar days) and Volatility = 14 day mean variance. Overall ranking is calculated based on a 30%/50%/20% weighting of these parameters. VFSIX is used as the momentum ranking filter.
The first period (Jan 1995 – Dec 2004) is totally Out-of-Sample:
In the above figure, VTSMX is plotted (red line) as a reference (not a suitable benchmark since it does not reflect a globally diversified portfolio). As can be seen from the above figure, the momentum model does not (on average) beat the performance of VTSMX (Total US Equity market) over this period. However, it does avoid the significant 43% drawdown experienced in the US equity markets in the 2 year 2000-2002 period. This is primarily due to the application of the VFSIX ranking filter. Despite the fact that the momentum model does not “beat” the VTSMX in terms of necessarily generating a higher total return, performance, considering volatility (~15%), and drawdown (~21%) is probably far more acceptable to most investors even though there would almost certainly be a level of unhappiness in the 1995-2000 period of the technology bubble. [With the benefit of hindsight, performance could probably have been improved by the inclusion of a technology index (Nasdaq 100) fund in the asset mix – but that’s hindsight that we don’t have]
If we now look at the 2005 – Present period we see the following performance:
Now we see that the “Kipling” model handily beats the return of the VTSMX throughout the period with similar volatility (16.7%) and drawdown (19%) to that observed in the earlier period – i.e. volatility and drawdown are consistent. As a result of the rank filtering we also avoid the 49% drawdown generated in US equities during the 2008 financial crisis.
However, we have to remember that this is the same time period as used to “optimize” the parameters for the Rutherford portfolio – therefore we might expect performance for the Fund portfolio to be more optimized over this time period. This suggests that the “optimized” parameters are not optimal for the 1995-2004 period – this is exactly what we should expect and why we look for robust parameters.
Looking at performance over the entire 20-year period we get the following picture:
The above plot may not look too impressive, particularly in the early years, and probably should be plotted with returns on a logarithmic scale to compare percentage moves rather than emphasizing the exponential returns resulting from the compounding. However, numerically, we should note that the Compound Annual Growth Rate (CAGR) for the first 10 years is still a respectable 9.95% and, for the last 10 years is 13.13% with a 20 year value of 11.47% – so the picture is maybe a little misleading.
Average total portfolio return is 862.6% (11.47% CAGR) but, as in the Rutherford back-tests reported in Part 5 of the study, there is a wide variance of 231.8% resulting in a 95% probability of total returns lying anywhere between ~ 862.6 -/+ 463.6% (2 SD) or ~400% and 1,300% – quite a range due to the effects of compounding over a 20 yr period.
So let’s see what happens if we apply the tranching model to the analysis. The following figure shows average total returns (left hand axis, solid lines) and P(0.9) Returns, the value representing a minimum return with a 90% probability that this value will be exceeded (right hand axis, dashed lines). These returns are plotted as a function of maximum tranche look-back period (as described in Part 5 of the study):
As we saw in Part 5 of the study we see a reduction in average total return as we increase the tranche look-back period. However, we also see a range of look-back periods, between ~5 and 15 days, where P(0.9) Returns are not significantly affected by tranching (although expected average return may be lower as the length of the look-back is increased). Separation periods of 1-3 trading days are to be preferred.
As before, we can compare the variance between standard “Kipling” tests – with no tranching – to a tranching model. Performance of the portfolio using an 11 tranche sequence with 1-day separation is shown below:
Comparing this with the no-tranche figure above we see that the variation in total returns is far “tighter”, but that this comes at the expense of the average total return that is reduced to 773.6% (from 862.6%). We therefore need to ask ourselves “Are we more comfortable accepting a system that has a 90% probability of generating a minimum 637% return with the possibility of an (average) 774% return or do we prefer to accept a 90% probability that we generate at least 565% with the possibility of an (average) 863% return?” The answer to this question will obviously be different for each investor depending on their personal situation and tolerance for risk. This will determine whether the tranching model is a viable model for them and what parameters might be used. As Ernie has noted, using a tranching model is likely to increase trade frequency (and hence costs) a little, although tests indicate that this ranges from an average of ~2 trades per period for no tranching to a maximum of ~4 trades per period with many tranches. At the level of tranching we are looking at here (up to ~10 tranches) our studies show that we are probably looking at a maximum of ~3 trades per period.
For the sake of completeness the following show the results of the tranche tests over each 10-year period for comparison with the no-tranche plots.
- The standard “Kipling” momentum ranking model, when applied to Out-of-Sample data, using “optimized” parameters, provides acceptable performance with consistent profitability together with acceptable volatility and draw-downs. However, performance may not be optimal due to the use of non-optimized parameters appropriate to the Out-of-Sample period. This is always to be expected – the question remains “Do the parameters used represent a robust set of parameters that generate acceptable results with minimum risk?”
- Uncertainty due to the “luck” of review date sequence can be reduced/minimized by using a tranching model in combination with the momentum ranking model. However, while this may reduce the probability of poor minimum returns it may result in a lower maximum and average expected return.
While writing this post I received a message from Lowell asking why the momentum model did not match the performance of VTSMX in the 1995-2000 period – especially since VTSMX is available for inclusion in the portfolio.
I think there are at least 2 reasons for this:
- The model (as tested) requires two assets ranking higher than VFISX to be included in the portfolio – therefore the maximum allocation that can be assigned to VTSMX is 50% and we cannot beat the performance of VTSMX unless another asset is performing better. This is why an Index Fund such as VTSMX is not a good benchmark for a portfolio designed for diversification. It is only appropriate if the objective is to beat/match that index – in which case the entry/exit rules and portfolio makeup would probably be significantly different e.g. the index might simply be broken down into component sectors and bonds and other asset classes might be ignored.
- We know that momentum, by its nature, is sensitive to look-back period in detecting trends – the shorter the look-back period, the faster it will be to pick up a trend and also to exit the trend. However, even in a bullish market, there will be pullbacks and we may encounter whipsaws if we are using an inappropriate look-back period. If we were to use Gary Antonacci’s 12 month look-back period we might see better performance over this 1995-2000 period (I don’t know because I haven’t tried it) and over 40 years this might be a better choice of parameter – but, then again, performance in the 2005-2015 period might be inferior.
Out of interest, after receiving Lowell’s question, I decided to go back to my original non-optimized parameter settings of ROC1 = 63 trading days (91 calendar days), ROC2 = 126 trading days (182 calendar days) and Volatility = 10 day mean variance. I also went back to my original weights of 50%/30%/20% respectively since I’d always thought that weighting the short-term momentum period higher would be preferable in terms of the intermediate-term time horizon that I was most concerned about.
The results are shown below:
Average total returns are 1077% compared with the tests using “optimized” parameter values that came in at 863% (12.56% CAGR vs 11.06% CAGR). Volatility and draw-down remains about the same.
If I were to choose to reduce that variance due to check date “luck” I might use a tranche model using a 10 trading day look-back (2 week’s data) with 1-day separation between tranches:
This reduces my average expected return to 880.5% but gives me a 90% probability of making at least 692%. However, this is where we have to be a little careful since this is exactly the same P(0.9) return we might expect from using the basic no-tranche Kipling model that has a higher average expected return.
Finally, one additional observation. Some members may ask if there is an advantage in only selecting the top ranked asset rather than top 2. I ran a quick test using 2-day separation tranches and compared the average and P(0.9) returns for different look-back periods:
As we might expect, the average total returns are reduced by increasing diversity (2 assets vs 1), however, note that variance is reduced since the P(0.9) returns are higher for the 2 asset scenario than for the 1-asset case.
Optimization is fine and can be a great learning experience but don’t get too hung up in the optimized numbers. As we have seen in these studies, the optimized parameters determined from limited data are not likely to be optimal under all market conditions – however, they might still be considered robust in that they should still generate acceptable, if not optimal, results.
I rest my case that my original un-optimized parameters are still robust – they are not optimal for the 2006-2015 period but they still generate acceptable returns and perform better over the 20 Year 1995-2015 period than the “optimized” parameters.
David and Ernie