The Importance of Investment Policy:
A Simple Answer To A Contentious Question
Ronald J. Surz, Managing Director, Roxbury Capital Management
Dale Stevens, Principal, Wurts & Associates
Mark Wimer, Senior Consultant, Ibbotson Associates
The long-held belief that investment policy accounts for more than 90% of performance results has recently been questioned by critics of the study that established this belief. As described in the famous Brinson, Hood, Beebower (BHB) article "Determinants of Portfolio Performance,"1 the study used data from 91 large pension plans over a 10-year period (1974-83) to determine that investment policy explains an average 93.6% of the variation in total plan return. The authors reached this conclusion by regressing quarterly actual returns against policy returns for each of the 91 funds and then calculating the simple average of the R2s from these regressions. Recent critics have argued that R2 is an inappropriate gauge of the importance of asset allocation because it measures the percent of volatility explained not the percent of return. These critics further argue that the "real" number for the percent of return explained by policy is significantly lower than 93.6%.
The critics are both right and wrong. R2 is not the correct measurement for the question they want answered, so they're right. However, the correct measure shows that the percent of return explained by policy is well above 93.6%, so they're wrong.
R2, or more correctly 1-R2, actually measures a manager's conviction in his insights, since it reflects his deviation from passive implementation of his policy. The higher the R2, the closer the portfolio's asset allocation has been to policy over time, and hence the lower the conviction. The BHB result tells us only that the plans in their sample played it very close to the vest, that is, the high average R2 indicates that the average fund adhered very closely to its policy targets and used broad diversification within asset classes. It tells us nothing about the importance of asset allocation. Another set of funds invested with more conviction would definitely show a lower average R2, but we still wouldn't know how much of the return was explained by policy.
While the critics of the BHB study suggested - but failed to agree on -- alternative measures to better estimate the importance of investment policy, they unanimously concluded that policy is relatively unimportant. In general, these critics have focused on the cross-sectional R2, which measures the tendency for policy to differentiate performance across funds. The distinction here is between cross-temporal R2, measured for each fund, and cross-sectional R2, measured across funds. In both cases, R2 is an incorrect measure because it relates to the variability of returns, rather than the magnitude of returns. We believe that the statement "investment policy explains x% of performance" pertains to the magnitude of return, not the variability of return.
To properly measure the effect of asset allocation on the magnitude of investment performance, we need only develop a very simple framework. Let's take the view that there are just two components to total return: policy, and everything else. We can then measure the percent of return explained by policy by simply figuring the ratio of the policy return to the total return. The numbers to approximate this calculation are available in the BHB study: The average policy return is 10.11%, and the average actual return is 9.01%. So our ratio is 10.11/9.01, or 112%.2 Investment policy explains 112% of the average fund's performance in the BHB study. It's really that simple.
But what does this say about the "everything else" in our simplified view of the world? It says that, on average, everything else subtracts value. Let's see why this makes sense. "Everything else" can be broken into three major components: sponsor effects, manager effects, and costs. Sponsor effects are primarily manager-of-manager effects, which various studies have shown can subtract value, especially if the sponsor is a hot dot chaser, that is, one who hires managers with strong historical performance records, then fires them after a period of underperformance. However, let's agree that sponsors probably don't add or subtract much on average because their primary objective is risk control, not strategic juggling. Similarly, manager effects, in aggregate, must be zero versus the market because in aggregate they are the market.3 This suggests that the major culprit in subtracting value is cost, primarily transaction costs, since on average the effects of sponsors and managers are neutral.
Just because the average impact of investment policy is near 100% doesn't mean that active management is worthless. Quite the contrary, there was a distribution of "% explained" across the funds in the BHB study, with roughly half exhibiting measures below 100% and half above 100%. In other words, half the managers were better than average. The important questions have always been, and will continue to be: Can these better-than-average managers be identified in advance, and can they deliver sufficient value to earn their fees? In the following, we describe two extensive new studies that attempt to answer these important questions. 4 The studies were performed independently by Mark Wimer of Ibbotson Associates and Dale Stevens of Wurts & Associates, with the collaboration of Ronald Surz of Roxbury Capital Management. The Ibbotson study, based on mutual fund data, is referred to "as the mutual fund study," while the Stevens study, which used pension funds, is called "the pension study."
The important questions addressed in this article are:
In the following, we describe our methodology and our results for the questions of interest: fraction of return explained by policy, and possible keys to success. In each section, we begin with a general overview followed by the specifics of each study.
Methodology & Data
The mutual fund study uses 10 years of monthly returns for 94 U.S. balanced funds. The 94 funds represent all balanced funds in the Morningstar universe with at least 10 years of data ending March 31, 1998. The average assets under management in this group were $1.8 billion, while the median was $291 million. The largest fund had $24.8 billion, and the smallest fund had just $3 million in assets. The pension fund study uses five years of quarterly data for 53 funds through March 31, 1998. Average assets under management were $93 million, and the range was $4 million to $701 million.
Since the policy used by the individual mutual funds was not known, returns-based style analysis was run on each fund to determine the overall best-fit benchmark (the policy) for each fund for the entire 10-year period. The asset classes and benchmarks used in the style analysis are shown in Table 1a, along with the average fund exposure to each asset class. The average R2 value from the style analysis was 81.4%, giving us confidence that the benchmarks used in style analysis were a good fit.
By contrast, the policy of each pension was known, including all changes in policy that occurred during the five years under study, so actual policy allocations were used. Also, each fund in the pension study uses its own custom benchmarks. For example, some funds use the S&P 500 for U.S. equities, while others use the Russell 3000. The average allocations among general market sectors are as follows:
In both the mutual fund and the pension studies, actual returns of each fund were regressed against the returns of that fund's policy benchmark. The R2 value from this regression was subtracted from one (1-R2) to define the level of conviction of the manager. A manager with a low R2 has a high conviction because he or she is willing to make significant bets away from the benchmark, leading to a return series that is not well explained by the benchmark return. Conversely, a high R2 indicates a low conviction, in which the manager's actual asset allocation closely followed policy over time. This measure of conviction is used to analyze the results by conviction level, or willingness to be active, rather than to measure the importance of policy, as the BHB study did.
Finally, the percent of fund return explained by policy return was calculated as the ratio of annualized policy return divided by the fund return. The policy return in the mutual fund study had two basis points per month subtracted from it to approximate the cost of replicating the policy mix through indexed mutual funds (approximately 25 basis points annually). The risk-adjusted version of the policy ratio was calculated by dividing the Sharpe ratio of the policy by the Sharpe ratio of the fund. The Sharpe ratio is the excess return above Treasury bills divided by the standard deviation of excess returns; in other words, it's the return per unit of risk.
What portion of return is explained by policy?
Many people think this is the question posed by the BHB study and answered by estimating an R2 value above 90%, but it is not. We, however, do answer this question using the data from that study and the new data from our pension fund and mutual fund studies. Table 2 shows the average of fund return explained by policy return for the original BHB study in 1986 and an update in 1991 and for the two data sets used in our studies. On average, policy accounts for a little more than all of total return in all cases except for the pension fund portion of this study, where the result is 99%. This means that, on average, the balanced mutual funds did not add value above their policies due to the combined deleterious effects of timing, selection, and management fees and expenses. The pension funds in this study slightly outperformed their policies on average, which could be due to security selection and/or timing. We examine these possibilities below. On a risk-adjusted basis, the overall mean and median results are even worse (i.e., greater subtraction of value), especially for the mutual funds.
As noted in the introduction, the fact that average results indicate that the average fund doesn't beat its policy does not mean that all funds underperform. At the top quartile of success for both studies, policy explains about 95% of performance results. The 75% of funds that are less successful have more than 95% of their returns attributable to policy, indicating that the BHB result applies to the more successful managers. Table 3 shows the range of percent of fund return explained by policy return. The wider range for mutual funds indicates that these funds made larger timing and selection bets against their policies than pension plans. The range is even broader on a risk-adjusted basis.
Does conviction matter?
A closer look at fund characteristics reveals that some funds are more likely to succeed than others, as indicated by a fraction of return explained of less than 100%. In the mutual fund study, we find that high conviction, defined as the bottom 20% of funds by R2, tends to succeed more often than lower conviction, although the range of results is much wider. The pension results do not confirm this finding; we believe this is due to the fact that pension plans, by virtue of their broad diversification, do not approach the level of conviction employed by some of the mutual funds.
The percent of return explained by policy was divided into quintiles by conviction group to determine the correlation between different levels of conviction and superior results. The top panel of Table 4 shows the results without adjusting for risk. Mutual funds with the highest level of conviction had the best results at median (93% of fund return explained by policy return), but also the highest variability of results. The bottom panel of Table 4 shows the results on a risk-adjusted basis. In this case, the median results for all of the mutual fund conviction groups were higher than 100% except for the highest mutual fund conviction group. This suggests that mutual funds willing to make big bets were the only group as a whole able to add value at median on a risk-adjusted basis. However, this group also displayed the widest range of results on both the good and bad sides of the median. The results for pension funds were mixed and do not indicate any particular tendencies relative to conviction.
Table 5, which displays the range of conviction (1-R2), shows that mutual funds have a much wider range of conviction than pension plans do. While 5% of the mutual funds had more than half of their volatility (53.1%) explained by non-policy effects, the most aggressive pension plans had only 33.8% of their variability attributable to factors other than policy.
Do timing and selection matter?
We also examined the propensity for certain strategies to deliver value added. To this end, we use the following definitions:
Here we find that pure timers and pure selectors tend to succeed more often than funds or managers pursuing the other two strategies. It is possible that this outperformance is a byproduct of the time periods used, which cover rising stock markets. Funds that did not rebalance would tend to be classified as timers and consequently would have benefited from the continuing superiority of stocks over bonds during this period.
The results are shown in Table 6. Some evidence of skill is exhibited by the timers and the selectors within the mutual fund group, where 95-98% of return is explained by policy at the median. The range of results is wide for both groups. This appearance of skill tends to vanish when results are risk-adjusted, as shown in the bottom panel of Table 6, where medians for all strategies are higher than 100%. Readers should note that the mutual fund and pension fund analyses use only an in-sample approach (i.e., identification and sampling performed in the same time period). An out-of-sample approach may have produced different results.
We find that, on average, policy explains approximately 100% of investment returns. If a manager succeeds in adding value, this can decrease to as low as 85% when risk is not incorporated, and even further to 75% on a risk-adjusted basis. On the other hand, if the manager fails to add value, policy can explain as much as 135% of return unadjusted for risk, or 165% risk-adjusted, with the difference between these percentages and 100% explained by manager value subtracted through timing, selection, and/or costs. In other words, if a manager neither adds nor subtracts value, policy explains 100% of performance. If managers add value, the fraction of return explained by policy decreases, with the balance explained by the amount of value added. If managers subtract value, policy explains more than 100% with the balance explained by the amount subtracted.
We also find evidence that certain approaches tended to add more value than others during the time periods of the studies. The mutual fund results show a tendency for high conviction funds to perform better. Further investigations into types of conviction suggest that those who purely time and those who purely select are more likely to succeed than those who attempt to both select and time, though adjusting for risk wipes out their advantage. We leave it to the reader to decide if these might be the better places to look for managers with skill in adding value.
Many thanks to David Rismann of David Rismann Consultants for his invaluable analytical assistance and contributions to the pension fund analysis and to Gale Morgan Adams of GMAssociates for her editorial assistance in the preparation of this article.
How to identify skill
We want to allocate among skillful managers to achieve the greatest upside potential above our MAR relative to the downside risk of falling below our MAR. First we want to find managers who have demonstrated an ability to beat their style-customized benchmarks on a risk-adjusted basis, as described in the previous section. Then we estimate how often each manager might be above our MAR and how far above. Table 1 demonstrates how this is done.