Perhaps “perils” is too harsh a term to attach to the process of back-testing. A synonym and the scientific term is “uncertainty.” Before identifying the uncertainty of back-testing, let me say at the offset that back-testing is the best we have when it comes to identifying if an investing model is sufficiently robust to merit consideration. The majority of back-testing (BT) on this site centers around the Rutherford 10 portfolio. Since the Rutherford 10 is central to most tests, let me take a paragraph or two to explain the origin of the Rutherford portfolio.
The first “Rutherford post” on this blog dates back to October of 2014. If memory is close to accurate, the very beginning goes back into the era when this blog was hacked. Posts from February 14, 2008 through 2013 are no longer available due to the hosting service losing about 1000 of my posts. Be that as it may, the ETFs that make up the current Rutherford came from a mix of asset classes recommended by William J. Bernstein in his book, The Intelligent Asset Allocator: How to Build Your Portfolio to Maximize Returns and Minimize Risk. Bernstein’s writing resonated with me as I was serving on an endowment committee at the time and the endowment portfolio was managed using asset allocation principles. Many of us were also influenced by the academic papers (Determinants of Portfolio Performance) by Brinson et al. The Brinson papers are frequently misinterpreted so be careful how you read them. Summary: The Rutherford needed to be diversified across all asset classes.
In the summer of 1999, Ronald J. Surz and two other authors wrote an important paper, Investment Policy Explains All. In other words, pay attention to asset allocation. In the January/February 2000 issue of Financial Analysts Journal, Roger G. Ibbotson and Paul D. Kaplan wrote an extremely important paper, Does Asset Allocation Policy Explain 40, 90, or 100 Percent of Performance?. Asset allocation is more important than most investor think and that is one reason why the Rutherford is so diversified.
In 2009, Mebane T. Faber and Eric W. Richardson published The Ivy Portfolio: How to Invest Like the Top Endowments and Avoid Bear Markets. Around this time ETFs were beginning to become mainstream. Vanguard was late to the game as iShares were by far the most popular ETFs. Faber’s book is one of the first to promote Vanguard ETFs as they were relatively new. Again, Faber and Richardson provided background advice for the construction of the Rutherford 10.
In the Faber book, one recommended portfolio used these ETFs. Note how closely they align with those currently used in the Rutherford. Faber’s 10 asset classes or ETFs are: VTI, VB, VEU, VWO, BND, TIP, VNQ, RWX, DBC, and GSG. Around 2009 to 2011 I began to work with portfolios using these ETFs for portfolio construction. What now operates as the Rutherford portfolio has a semi-long history of close to ten years.
And now to get to the main point of this post or writing a few uncertainties that surrounds all back-testing, not just data posted here at ITA.
- Bid-Ask spreads impact live portfolio management whereas BTs work off of close-of-day prices.
- Individual judgment such as rounding to the nearest five or ten shares is not unusual. BTs do not round.
- The individual investor forgets to rebalance or is on vacation. The back-tester never takes a vacation.
- Any individual alteration in the makeup of the portfolio will impact the results. For example, does one use VEU or VEA for international equity exposure? I’ve added SHY and SPTL as options in addition to the bond ETF. This will likely make a difference in performance.
- When is the portfolio rebalanced or reviewed. As we see from the Haynes results, it makes quite a bit of difference. End-Of-Month (EOM) shows performance is improved over the 33-day rotation. Is there a statistical difference or something going on that we don’t fully understand?
- Are limit orders used by the manager? This may delay or even miss a transaction. Keep in mind that these uncertainties work both ways. Sometimes a portfolio is helped, other times the decision turns negative.
- What securities are used in the BT? Does the model work with out-of-sample testing? Since ETFs are relatively new, particularly when it comes to covering all the asset classes of interest to us, it is not easy to run long historical tests. For example, it is not possible to find investable mutual funds that cover all asset classes of interest beginning in 1965. One would like to see how the model performed in the recession of the 1970s, recovered from the 1987 October crash, and what happened during the tech bubble and bust period of the early 2000s.
- One variable frequently forgotten is what I call the multiplier effect. When working with multiple variables, as we are when the number of ETFs rise to ten in number, the uncertainty of the final results fan out like peacock feathers. Here are a related concerns articulated by Ernie Stokely. Or check the back-testing graph in this article. I mention the multiplier effect as we tend to concentrate on one number when we see a back-test. That gives a false sense of security when we should be thinking errors of all the uncertainties built into the back-testing results.
- For most of our portfolios, cash is flowing in and out of the portfolio. Back-tests don’t account for this. Exactly when the cash movement happens can have a huge impact on the results.
- Problems with data. Databases are notorious for errors in dividend calculations.
Some of you might recall an investment model know as Dogs of the Dow. Michael Higgins wrote a book on this back around 1991. The Motley Fools still pump this investing method, even though it was shot down by several academics. I know one math major who did a detailed study of the Dogs and found in several years the stocks to be included as one of the four Dogs only qualified by less than one percent of the cutoff. Had this not been the case, the results would have been significantly different. Higgins wrote a second book on Beating the Dow with Bonds. You can still find my review of this book if you scroll down the few reviews still available. If you want to witness a case of data mining, this is it.
Readers will come up with their own uncertainties to add to this brief list. Having taken back-testing to task, it is still as good as it gets. Back-testing should not be cast into the trash bin. However, we need to be aware of uncertainties when looking over results. The ITA models, be they Strategic Asset Allocation, Dual Momentum, or one of the several Tranche Momentum approaches, all require one to begin with high quality securities including broad diversification.