Netflix (NFLX) is a classic example of the value of the Sharpe ratio and a great investment story. By the end of 2002, the year NFLX went public, the company had over a million subscribers and its stock closed at a split-adjusted $0.71 price. At the end of 2019, it had 167 million subscribers and its stock price closed at $323.31. In just the last 10 years, the mean annual return equals an impressive 35%. But how should we compare NFLX’s performance with other investments? Berkshire-Hathaway (BRK-A), for example, earned only 10% per year during the same period. Was NFLX a better investment? Does the Sharpe ratio help us answer this question? More generally speaking, what is a good Sharpe ratio?
Comparing investments solely by their returns is like comparing the value of two stocks solely by their prices. Value and price aren’t the same. NFLX recently traded at a $525 price per share and BRK-A traded at $317,123. Does that mean BRK-A was more valuable than NFLX? BRK-A surely cost more per share, but price-only comparisons are not apples-to-apples. Stock prices are a function of the number of outstanding shares and other factors unrelated to value.
Instead, we compare stock prices by anchoring them to earnings (net income) or some other valuation measure. We do about the same thing to measure business liquidity (e.g., quick ratio), use of debt (e.g., debt-to-equity ratio), and profitability (e.g., return on equity) to name a few. NFLX’s recent trailing price-to-earnings ratio was about $80; BRK-A’s was about $23. That is, the ratio quantifies the price for each dollar of earnings. NFLX cost $80 for each dollar of earnings and BRK-A cost $23. NFLX was more expensive than BRK-A in earnings terms.
Likewise, to compare investment returns we must anchor them; we use a risk anchor. The most widely used risk anchor is the standard deviation, and the most widely used measure of return anchored to risk is the Sharpe ratio. The numerator has an adjusted return value and the denominator has the standard deviation of the adjusted returns. The ratio quantifies the return earned for each unit of risk. The Sharpe ratio is useful for comparing investment performance, it has some problems, and there are better alternatives that were unavailable when William Sharpe proposed the ratio in 1966.
How Investors Use the Sharpe Ratio
Investors use the Sharpe ratio to compare:
- the same asset’s risk-adjusted returns across periods (e.g., last year compared to the year before),
- multiple assets for the same periods (e.g., NFLX compared to BRK-A for the same ten-year period), and
- single or multiple assets for projected returns and risk (e.g., projections for the three years starting next year).
In all comparisons, the largest ratio value indicates the best risk-adjusted performance. A single ratio (not a comparison) has little value except for its sign and magnitude. A positive (negative) ratio only means the investment’s return was greater (less) than the risk-free rate. A larger or smaller ratio value means the investment’s return was more or less different from the risk-free rate. It would be incorrect to say a ratio of a certain magnitude is good or bad because even the best investments have negative Sharpe ratios during market downturns. When the ratios are negative, the ratio whose value is closest to zero is superior.
Sharpe ratios are suitable for comparisons of individual investments or portfolios like exchange-traded funds (ETFs), and they are comparable across many different kinds of investments and indexes (e.g., ETFs, individual securities, mutual funds, and the Dow Jones Industrial Average).
What is a good Sharpe ratio?
Here are examples of mean annual Sharpe ratios based on mean “adjusted” total returns, which include reinvested dividends for the last ten years[i]:
Why “risk premium” instead of “average returns”?
Before looking at the numbers in the table, we explain the reason William Sharpe made the “adjustment” to total returns, choosing to use the “risk premium” instead of total returns. You can divide total returns into two parts. One part is the tiny return earned on investments that impose no risk. Examples include Treasury bill and savings account interest. The other part is the return an investor can earn only if they assume risk.
Dr. Sharpe’s idea was that markets compensate investors for the risk they assume. If investors assume no risk, then they earn only a tiny return and they endure no volatility. Volatility is the risk proxy, and the standard deviation measures it. Because risk-free investments impose no volatility, the risk-free part of total returns should be taken out of the equation. The remainder is the risky part of total returns to which the standard deviation applies. With this adjustment, the Sharpe ratio can measure how much return an investor earns for each unit of risk. If the risk-free part had been left in, some of the return wouldn’t have been related to the risk, so you couldn’t make that last statement.
Of all examples in the table, the S&P 500 Index earned the highest risk-adjusted return, and NFLX, a NASDAQ 100 Index component, earned the lowest even though the stock earned the highest of all risk-premiums. Indexes are not investible, but they serve as useful benchmarks. The investment with the highest risk-adjusted return is the 60%-40% stock-bond portfolio. The portfolio consisting of ETF components SPY and AGG replicate domestic large-cap stocks and domestic investment-grade corporate/government bonds, respectively. The method rebalanced the three SPY-AGG portfolios annually to achieve prescribed allocations. Research indicates rebalancing improves portfolio performance compared to a buy & hold strategy. The 60%-40% portfolio even outperformed the tech-heavy NASDAQ 100 Index, though the portfolio’s risk premium was only a little over one-third of the Index’s.
Notice how the 60%-40% portfolio outperformed the comparison portfolios. The 80%-20% stock-bond portfolio earned a higher risk premium, but volatile returns undermined its risk-adjusted return. The 20%-80% stock-bond portfolio earned such a measly risk premium, its lower risk was insufficient to produce a competitive Sharpe ratio.
Sharpe Ratio Calculation
The ratio’s numerator is usually the arithmetic mean return – the sum of all returns divided by the number of returns – adjusted by subtracting the risk-free rate. The result of that difference is called the risk premium or excess return. The risk-free rate is usually a Treasury bill rate like the three- or six-month rate.
The standard deviation is the only value in the denominator; it is calculated from the risk premia. Whereas the numerator has the mean of returns (with a deduction for the risk-free rate), the standard deviation has the mean dispersion of returns from the mean of returns.
For fund investors who don’t care to do the math, Morningstar.com provides free three-, five-, and ten-year Sharpe ratios for most mutual funds and ETFs in its “Risk” tab. For example, SPY’s Sharpe ratios appear here.
Sharpe Ratio Limitations
The Sharpe ratio’s greatest value, other than its utility and validity, lies in its elegance. It offers a fairly simple way to draw comparisons, but users should be aware of its limitations.
The Sharpe ratio’s greatest value, other than its utility and validity, lies in its elegance. It offers a fairly simple way to draw comparisons, but users should be aware of its limitations.
Using the Ratio in Security Selection
Security selection is a process of comparing and ultimately choosing to buy and/or sell securities. The Sharpe ratio can make historical comparisons or comparisons of projections. These comparisons can be useful for historical analysis or as a framework for investment planning, but they are unreliable if used for security selection. On one hand, historical Sharpe ratio values are not reliable indicators of future values because all three inputs – returns, risk-free rate, and standard deviation – change. On the other hand, projections of future returns and, to a lesser extent standard deviations, are miserably unreliable. Put differently, we can see historical details, but they don’t reliably represent future details. Also, we can use valid historical data to make projections using sophisticated methods, but nobody has a reliable crystal ball. The best use of the Sharpe ratio is for historical comparisons.
The Standard Deviation and Risk Aversion
The statistic measures the mean dispersion of an entire distribution from the mean. For example, in preparing the table above, the ten years of NFLX historical returns consisted of 2,520 daily returns. That distribution includes a wide range of daily returns. The largest single-day loss was 34.90% and the largest single-day gain was 42.22%. But risk-averse investors care far more about losses than about gains. Using the entire distribution to characterize losses is inefficient. To remedy this problem, some have suggested using the semivariance instead.
The semivariance measures only the negative observations or else the observations less than the mean. These are the observations that matter to risk-averse investors. In 1991, Frank Sortino and Robert van der Meer applied the semivariance idea to the Sharpe ratio and created what is widely known as the Sortino ratio. This ratio puts the semivariance (instead of the standard deviation) in the denominator, giving focus to what matters. The Sortino ratio’s numerator is the same risk premium used by the Sharpe ratio.
The Standard Deviation and Fat-Tailed Distributions
As you read about the standard deviation and distribution form here, please keep in mind the standard deviation is only a mean, an average. It’s a mean of the differences between each of the returns and the mean of returns. Means are convenient because a single number can represent an entire distribution. In this case, the standard deviation is a single number that represents all of the differences between the returns and the mean of the returns. But the biggest downside of a mean is it masks extreme values; extreme values get canceled out.
Use of the statistic assumes the data distribute in normal form – you know, a normal distribution, a “bell curve.” A “distribution” is a convenient way to think about and visualize many repeated observations of the same thing. In the present case, the “same thing” is the 2,520 daily returns of the S&P 500 Total Return Index or the daily returns of NFLX. Most return observations assume values that cluster near the mean (arithmetic average) of all observations in the sample. As you get farther from the mean on both sides of it, the number of observations gradually diminishes.
What is a normal distribution?
A normal distribution (sometimes called a Gaussian distribution) is a special case of distributions. A normal distribution is a probability distribution that is symmetric around the mean, and it shows that data near the mean are more frequent than data further from the mean. Roughly 2/3rds of all observations fall within one standard deviation of the mean; about 95% fall within two standard deviations. A normal distribution appears as a bell curve.
The “bell curve” can be created by assigning observations to ranges (bins) and counting the number of observations in each bin. For example, among the 2,520-daily NFLX observations, 28 fall in a bin range from -0.05 to -0.07 while 11 fall in a bin range from -0.07 to -0.09.
The problem with investment returns is they rarely assume the shape of a bell curve. Rather, more observations cluster near the mean, and more observations fall far from the mean than you’d expect from a normal distribution. This distribution form is called “fat-tailed” or “leptokurtic.” The figure titled “S&P 500 Index & Superimposed Normal Distribution” illustrates the typical distribution form of investment returns with a comparison to a normal distribution. The vertical axis is the number of observations in each bin, and the horizontal axis is the bins. The bins aren’t labeled because they don’t contribute to this discussion.
The peaked distribution represents the 2,520 daily returns of the S&P 500 Total Return Index during the last ten years. The solid-line “bell curve” represents 2,520 simulated returns with about the same mean and standard deviation as the S&P 500 Total Return Index distribution.
It’s easy to see how much more peaked the S&P 500 Index curve is; it’s not so easy to see the number of S&P 500 Index observations far from the mean. The outliers are the little blips protruding above the solid normal distribution line distant from the mean. Sixteen S&P 500 Total Return Index observations fall below the minimum return in the normal distribution. That is, 16 Index losses are worse than you’d expect if the losses assumed a normal form.
That’s a tiny percentage of all the observations and you might think it shouldn’t matter. But consider this: The simulated normal distribution’s greatest daily loss is 3.84%, but the greatest daily loss in the S&P 500 distribution equals 12.8%. That difference matters!
The standard deviation doesn’t detect the outliers because clustering near the mean (the high peak) cancels them out as any mean would.
Here’s the problem on the ground: The market might go through a long period of low volatility when returns don’t change much from day-to-day. Low-volatility returns are represented by the clustering of observations near the mean that creates the high peak. Then suddenly the market suffers extraordinary losses. It’s like it comes from nowhere. The market’s single-day 22% loss in October 1987 is a great example of this, and there are many others. Some people called that market collapse a “black swan” because it shouldn’t have happened. According to one author, if you assumed a normal distribution, that 22% loss would have occurred in only one of 4.03 x 10181 trading days. That’s a probability of one day out of 403 followed by 181 zeroes days. Earth has not experienced that many trading days. Of course, this probability estimate is ridiculous. It’s also ridiculous that the standard deviation of 1987-1988 S&P 500 Index returns including the single-day 22% loss was less than the standard deviation of returns for the year ending in September 2020! The standard deviation, because it’s an average, masks extreme events, so it can understate risk.
The Standard Deviation and Skewed Distributions
Whereas fat-tailed distributions lead to standard deviations that understate the risk of extreme events, skewed distributions create a different problem. They give greater weight to either losses or gains depending on the skew direction. When a distribution is skewed, its mean falls on one side or the other of the peak. In the figure titled, “Skewed Distribution,” the vertical axis is the number of observations falling in each bin, and the horizontal axis is the distribution values – the returns – grouped in bins. The mean of the distribution, which is slightly greater than zero, is represented by the vertical dotted line. The skewed distribution here is more skewed than you’d usually find in investment distributions; this exaggerated skewness is used to make a point about how it biases the standard deviation.
Recall that risk-averse investors are most concerned with negative returns. In this left-skewed distribution, even though the largest negative observations fall farther from the mean than the largest positive observations, 56% of the 2,520 observations assume values greater than the mean. The majority of observations clusters closer to the mean than the minority of observations that strays far from the mean. The standard deviation gives equal weight to each observation, so in a left-skewed distribution like this, the standard deviation represents the densely packed observations more than the widely dispersed values left of the mean. The result is a standard deviation suggesting spuriously less risk than the widely dispersed negative values would suggest.
Alternatives to the Sharpe Ratio
The Sortino ratio using semivariance has already been discussed as an alternative. Because semivariance is also a mean of dispersions, fat-tailed and skewed distributions can disrupt it too. Some people suggest the “information ratio” is an alternative, but close examination reveals it can be the same as a Sharpe ratio or lead to erroneous decisions.
The Treynor ratio takes a form like the Sharpe ratio with the risk premium in the numerator and a risk measure in the denominator; the Treynor ratio uses “beta” to proxy risk. Beta is the risk of a stock or other asset relative to the risk of a suitable benchmark. The S&P 500 Index would usually be a suitable benchmark for large-cap domestic stocks, for example. Beta reflects systemic risk, the kind of risk all investments are exposed to, and not business-specific or otherwise special-case risk. A beta of 1.0 indicates the asset imposes the same risk as the benchmark. Beta values greater or less than 1.0 indicate the asset imposes more or less risk than the benchmark, respectively. In contrast, the standard deviation makes no distinction about the risk source.
Perhaps the best Sharpe ratio alternative replaces the standard deviation with expected shortfall (ES), also known as conditional value at risk. This ratio uses the same risk premium in the numerator, but the denominator is the average of worst possible losses. It might be the best alternative because:
- As a measure of downside risk, ES addresses the main concern held by risk-averse investors, unlike the standard deviation and beta.
- Non-normal distributions do not disrupt the statistic.
- The ES statistic takes into consideration potential for “black swan” events.
- For the purpose of measuring investment risk, it is a more statistically defensible metric than the standard deviation or semivariance.
ES is the average of losses beyond a user-prescribed loss threshold. The loss threshold is usually expressed as a percentile. For example, say your worst-case loss is at the 5th percentile. That is, 95% of losses would be smaller than the loss represented by your threshold. Based on the distribution form, ES equals the average of losses worse than losses at the 5th percentile.
Is a Sharpe Ratio (and its Alternatives) Necessary?
The idea behind the Sharpe ratio is to identify investments that have earned a high risk premium compared to the risk they impose. The result is a relatively large Sharpe ratio. In the table above, the “SPY (60%) AGG (40%) Portfolio” produced the largest Sharpe ratio compared to the other investments. This is the superior investment for that period compared to the other investments in the table.
So, back to our original question. What is a good Sharpe ratio? Well, if you need a specific answer, then you should probably consider the Sharpe ratio value that the 60/40 portfolio mentioned above produced. In the end, the Sharpe ratio is one of many tools investors can consider as they compare investments and portfolio models. Moreover, statistics and averages can be interesting to consider and examine with respect to large data sets. But perhaps young investors really don’t need to be troubled with what is a good Sharpe ratio. Young investors with long-term money don’t need to really be overly concerned with risk at all if such money is to remain untouched for decades.
But, the Sharpe ratio is still a useful tool to consider in the right situation. If something is worth managing, it’s worth measuring. This is true whether we talk about weight loss, physical endurance, business operations, or any other worthwhile activity. Look, you could use your belt size to measure weight loss. But measuring weight with a scale is more reliable and valid. You could also compare investments only based on return. But a Sharpe ratio or its alternatives are necessary if you want a valid, reliable comparison.