Correlation and Regression
by Richard A. DeFusco, PhD, CFA, Dennis W. McLeavey, CFA, Jerald
E. Pinto, PhD, CFA, and David E. Runkle, PhD, CFA
Richard
A. DeFusco, PhD, CFA, is at the University of Nebraska-Lincoln (USA). Dennis W. McLeavey, CFA, is at the University of Rhode Island (USA). Jerald E. Pinto, PhD, CFA, is at CFA Institute (USA). David E. Runkle, PhD, CFA, is at Trilogy Global Advisors (USA).
|
LEARNING OUTCOMES |
|
|
Mastery |
The candidate should be able to: |
|
a.
|
b.
calculate and interpret a sample covariance and a sample
correlation coefficient and interpret a scatter plot; |
|
c.
|
a.
describe limitations to correlation analysis; |
|
d.
|
b.
formulate a test of the hypothesis that the population correlation coefficient equals
zero and determine whether the hypothesis is rejected at a given
level of significance; |
|
e.
|
c.
distinguish
between the dependent and independent variables in a linear regression; |
|
f.
|
d.
explain the assumptions underlying linear regression and
interpret regression coefficients; |
|
g.
|
e.
calculate and interpret the standard error of estimate, the
coefficient of determination, and a confidence interval for a regression
coefficient; |
|
h.
|
f.
formulate a null and alternative hypothesis about a population value
of a regression coefficient and determine the appropriate test statistic and
whether the null hypothesis is rejected at a given level of significance; |
|
i.
|
g.
calculate the predicted value for the dependent variable, given an
estimated regression model and a value for the independent variable; |
|
j.
|
h.
calculate and interpret a confidence interval for the predicted value
of the dependent variable; |
|
k.
|
i.
describe the use of analysis of variance (ANOVA) in regression
analysis, interpret ANOVA results, and calculate and interpret the
F-statistic; |
|
l.
|
j.
describe limitations of regression
analysis. |
© 2017 CFA Institute.
All rights reserved.
INTRODUCTION
As
a financial analyst, you will often need to examine the relationship between
two or more financial variables. For
example, you might want to know whether returns to different stock market indexes are related and, if so, in
what way. Or you might hypothesize
that the spread between a company’s return on invested capital and its cost of capital helps to explain the
company’s value in the marketplace. Correlation and regression analysis are tools for examining these issues.
This reading1 is organized as follows. In Section 2, we present correlation analysis, a basic tool in measuring how two variables vary in
relation to each other. Topics covered include the calculation, interpretation, uses, limitations, and statistical testing of
correlations. Section 3 introduces basic concepts in regression analysis, a
powerful technique for examining the ability of one or more variables (independent
variables) to explain or predict another variable (the dependent
variable).
CORRELATION ANALYSIS
We
have many ways to examine how two sets of data are related. Two of the most useful methods are scatter plots and correlation analysis.
We examine scatter plots first.
Scatter Plots
A scatter plot is a graph that shows the relationship between
the observations for two data series in two dimensions. Suppose, for
example, that we want to graph the relationship
between long-term money growth and long-term inflation in six industrialized
countries to see how strongly the two variables are related. Table 1 shows the average annual growth rate in the
money supply and the average annual inflation
rate from 1980 to 2012 for the
six countries.
|
|
Table 1 Annual Money Supply Growth Rate
and Inflation Rate by Country, 1980–2012 |
||
|
|
Money Supply Growth Country |
Rate (%) |
Inflation Rate (%) |
|
Australia |
11.17 |
4.62 |
|
|
Japan |
4.08 |
0.18 |
|
|
South Korea |
17.81 |
5.31 |
|
|
Switzerland |
5.85 |
1.99 |
|
|
United Kingdom |
12.93 |
4.18 |
|
|
United States |
6.53 |
2.93 |
|
|
Average |
9.73 |
3.20 |
|
|
|
Source: International Monetary Fund. |
||
1
1 Examples in this reading were updated in 2014 by Professor
Sanjiv Sabherwal of the University of Texas, Arlington.
To translate the data in Table 1 into a scatter plot,
we use the data for each country to mark a point on a graph. For each point,
the x-axis coordinate is the country’s annual average money supply growth from
1980–2012, and the y-axis coordinate is
the country’s annual average inflation
rate from 1980–2012. Figure 1 shows a scatter
plot of the data in Table 1.
Figure 1 Scatter Plot of Annual Money Supply Growth
Rate and Inflation Rate by Country, 1980–2012
Source: International Monetary Fund.
Note
that each observation in the scatter plot is represented as a point, and the points are not connected. The scatter plot
does not show which observation comes from
which country; it shows only the actual observations of both data series
plotted as pairs. For example, the
rightmost point shows the data for South Korea. The data plotted in Figure
1 show a fairly strong
linear relationship with a positive
slope. Next we examine how to quantify this linear relationship.
Correlation Analysis
In contrast to a scatter plot, which graphically
depicts the relationship between two data
series, correlation
analysis expresses this same relationship using a single number.
The correlation coefficient is a measure of how closely related two data series
are. In particular, the correlation coefficient measures the direction and
extent of linear
association between
two variables. A correlation coefficient can have a maximum value of 1 and a minimum value of
−1. A correlation coefficient greater than 0 indicates a
positive linear association between the two variables: When one variable
increases (or decreases), the other also tends to increase
(or decrease). A correlation coefficient less than 0 indicates a negative
linear association between the two variables: When one increases (or decreases), the other tends
to decrease (or increase). A correlation coefficient of 0 indicates no linear
relation between the two variables.2
Figure 2 shows the scatter plot of
two variables with a correlation of 1.
Figure 2 Variables with a Correlation of 1
Note that all the points on the scatter plot in Figure 2 lie on a straight line with a positive slope. Whenever variable A increases by one unit, variable B increases by half
a unit. Because all of the points in the graph lie on a straight line, an
increase of one unit in A is associated with exactly the same half-unit
increase in B, regardless of the
level of A. Even
if the slope of the line in the figure were different (but positive), the correlation between the two variables
would be 1 as long as all the points lie on that
straight line.
Figure 3 shows a scatter plot for two variables with a correlation coefficient of −1.
Once again, the plotted observations fall on a straight line. In this
graph, however, the line has a
negative slope. As A increases
by one unit, B decreases by half a unit, regardless of the initial value
of A.
2 Later, we show that variables with a correlation of 0 can have a strong nonlinear relation.
Figure 3 Variables with a Correlation of –1
Figure
4 shows a scatter plot of two variables with a correlation of 0; they have no linear relation. This graph shows that
the value of A tells
us absolutely nothing about the value of B.
Figure 4 Variables with a Correlation
of 0
2.1 Calculating and Interpreting the Correlation Coefficient
To define and
calculate the correlation coefficient, we need another measure of linear association: covariance. We have previously defined covariance as the expected
value of the product of the deviations of two random
variables from their respective popu- lation means.
That was the definition of population covariance, which we would
also use in a forward-looking
sense. To study historical or sample correlations, we need to use sample
covariance. The sample
covariance of X
and Y, for a sample of size n, is
$$
Cov\left(X,Y\right)=\frac{\sum^n_{i=1}{\left(X_i-\overline{X}\right)}\left(Y_i-\overline{Y}\right)}{n-1}
$$ (1)
The sample
covariance is the average value of the product of the deviations of observations on
two random variables from their sample means.3 If the random
variables are returns, the unit of covariance
would be returns squared.
The sample correlation coefficient is much easier to explain than the sample covariance.
To understand the sample correlation coefficient, we need the expression for the sample standard deviation
of a random variable X. We need to calculate the sample
3 The use of n − 1 in the denominator is a technical
point; it ensures that the sample covariance is an unbiased
estimate of population covariance.
variance of X to obtain its sample standard
deviation. The variance
of a random vari- able is simply the covariance of the random variable with itself. The expression for the
sample variance of X, s2, is
$$
s^2_X = \sum^n_{i=1}\frac{(X_i - \bar{X})^2}{(n-1)} $$
The sample standard deviation
is the positive square root of the sample variance:
$$ s_X = \sqrt{s^2_X} $$
Both
the sample variance and the sample standard deviation are measures of the dispersion of observations about the
sample mean. Standard deviation uses the same
units as the random variable;
variance is measured
in the units squared.
The formula for computing the sample
correlation coefficient is
$$ r=\frac{Cov(X,Y)}{s_Y s_X}
$$ (2)
The correlation coefficient is the covariance of two variables (X and Y) divided by the product of their sample standard
deviations (sX and sY). Like covariance, the correla- tion
coefficient is a measure of linear association. The correlation coefficient,
however, has the advantage of being a simple number, with no unit of measurement attached. It has no units because it results from
dividing the covariance by the product of the
standard deviations. Because we will be using sample variance, standard
deviation, and covariance in this reading, we will repeat the calculations for these statistics.
Table
2 shows how to compute the various components of the correlation equation
(Equation 2) from the data in Table 1.4
The individual observations on countries’ annual average money supply growth from
1980–2012 are denoted Xi,
and individual observations on countries’ annual average inflation
rate from 1980–2012
are denoted Yi.
The remaining columns show the calculations for the inputs to correlation: the sample covariance and the sample standard deviations.
Table 2 Sample Covariance and Sample Standard Deviations: Annual
Money Supply Growth
Rate and Inflation Rate
by Country, 1980–2012
|
Country |
Money Supply Growth
Rate Xi |
Inflation Rate Yi |
Cross-Product $$ (X_i- \bar{X})(Y_i -\bar{Y})) $$ |
Squared Deviations $$ (X_i- \bar{X})^2 $$ |
Squared Deviations $$ (Y_i- \bar{Y})^2 $$ |
|
Australia |
0.1117 |
0.0462 |
0.000204 |
0.000208 |
0.000201 |
|
Japan |
0.0408 |
0.0018 |
0.001707 |
0.003190 |
0.000913 |
|
South Korea |
0.1781 |
0.0531 |
0.001704 |
0.006531 |
0.000445 |
|
Switzerland |
0.0585 |
0.0199 |
0.000470 |
0.001504 |
0.000147 |
|
United Kingdom |
0.1293 |
0.0418 |
0.000313 |
0.001025 |
0.000096 |
|
United States |
0.0653 |
0.0293 |
0.000087 |
0.001023 |
0.000007 |
|
Sum |
0.5837 |
0.1921 |
0.004485 |
0.013482 |
0.001809 |
|
Average Covariance Variance |
0.0973 |
0.0320 |
0.000897 |
0.002696 |
0.000362 |
|
Standard deviation |
|
|
|
0.051926 |
0.019019 |
|
Notes: 1
Divide the cross-product sum by n − 1 (with n = 6) to obtain the covariance of X and Y. 2
Divide the
squared deviations sums by n − 1 (with
n = 6) to obtain
the variances of X and Y. Source: International Monetary Fund. |
|||||
Using the data shown in Table 2, we can compute the sample correlation coefficient for these two variables as follows:
$$ r=\frac{Cov(X,Y)}{s_Y s_X}
= \frac{0.000897}{(0.051926)(0.019019)}=0.9083 $$
The correlation coefficient of approximately 0.91 indicates a strong linear
association between long-term
money supply growth
and long-term inflation for the countries in the sample. The
correlation coefficient captures this strong association numerically, whereas the scatter plot in Figure 1 shows the information graphically.
What assumptions are necessary to compute the
correlation coefficient? Correlation coefficients
can be computed validly if the means and variances of X and
Y,
as well as the covariance of X and Y, are finite
and constant. Later, we will show that when these assumptions
are not true, correlations between two different variables can depend greatly on the sample that is used.
2.2 Limitations of Correlation Analysis
Correlation measures
the linear association between two variables, but it may not always be reliable. Two variables can have a strong nonlinear
relation and still have a very low correlation. For example, the relation B = (A − 4)2 is a nonlinear relation contrasted to the linear relation B
=
2A − 4. The nonlinear relation
between variables A and B is shown in Figure
5. Below a level of 4 for A, variable
B decreases with increasing values of A. When A is 4 or greater, however,
B
increases whenever A
increases. Even though
these two variables
are perfectly associated, the correlation between them is 0.5
5 The perfect
association is the quadratic relationship B
= (A − 4)2.
Figure 5 Variables with a Strong Nonlinear Association
Correlation
also may be an unreliable measure when outliers are present in one or both of the series. Outliers are small
numbers of observations at either extreme (small
or large) of a sample. Figure 6 shows a scatter plot of the monthly returns to the Standard & Poor’s 500 Index and
the monthly inflation rate in the United States from January 1990 through December
2013.
Figure 6 US Inflation and Stock Returns: 1990–2013
Sources: Bureau of Labor Statistics
and S&P Dow Jones Indices.
In
the scatter plot in Figure 6, most of the data lie clustered together with
little discernible relation between
the two variables. Two cases, however (the two circled observations), stand out from the rest. In one of those cases,
inflation was extremely low at
almost –2 percent, and in the other case, stock returns were
strongly negative at almost
–17 percent. These
observations are outliers. If we compute
the correlation coefficient for the entire data sample,
that correlation is −0.0350. If we eliminate the two outliers,
however, the correlation is −0.1489.
The correlation in Figure 6 is quite sensitive to
excluding only two observations. Does it make sense to exclude
those observations? Are they noise
or news? When the outliers are excluded, there seems to be
a moderately negative correlation between inflation
and stock returns. One possible partial explanation of this negative correlation
is that whenever inflation was very high during a month, market participants became concerned that the Federal Reserve
would raise interest rates, which would cause
the value of stocks to decline. This story offers one plausible explanation for how investors reacted to large inflation
announcements. When the two outliers are included,
there is a noticeable decrease in the magnitude of the negative correlation. A closer examination of the monthly
data used in the scatter
plot reveals that the two outliers
correspond to the months of October and November 2008 when bad news regarding the US
economy and job market caused the stock market to decline sharply. During
these two months, although the inflation was not high (in fact, inflation was negative
in both months),
stocks declined substantially. Therefore, inclusion of those two outliers reduces the magnitude of the
negative correlation between inflation and stock
returns. One could argue that while the data without the outliers provide a
useful insight into the general relationship between inflation and stock
returns, the outliers may provide
information about the relationship during a period of market distress. Therefore, in this case, it would be
reasonable to report the values of the correlation including and excluding
the outliers.
As a general rule, we must determine whether a
computed sample correlation changes greatly
by removing a few outliers.
But we must also use judgment to deter- mine whether those outliers contain
information about the two variables’ relationship (and
should thus be included in the correlation analysis) or contain no information (and should thus be excluded).
Keep in mind that correlation does not imply causation. Even if two variables are highly
correlated, one does not necessarily cause the other in the sense that certain values
of one variable bring about the occurrence of certain values of the other. Furthermore, correlations can be spurious
in the sense of misleadingly pointing towards associations between variables.
The term spurious
correlation has been used to refer to 1) correlation between two
variables that reflects chance relationships in a particular data set, 2)
correlation induced by a calculation
that mixes each of two variables with a third, and 3) correlation between two
variables arising not from a direct relation between them but from their relation to a third variable.
As an example of the second kind of spurious
correlation, two variables that are uncorrelated may be correlated if
divided by a third variable.
As an example of the third kind of spurious
correlation, height may be
positively correlated with the extent of a person’s vocabulary, but the
underlying relationships are
between age and height and between age and vocabulary. Investment professionals
must be cautious in basing investment strategies on high correlations. Spurious correlation may suggest
investment strategies that appear profitable but actually would not be
so, if implemented.
Uses of Correlation Analysis
In this section, we
give examples of correlation analysis for investment. Because invetors’ expectations about inflation are important in
determining asset prices, inflation forecast accuracy will serve as our first example.
|
EXAMPLE 1 |
Evaluating Economic Forecasts (1)Investors closely
watch economists’ forecasts of inflation, but do these fore- casts contain useful information? In the
euro area, the Survey of Professional Forecasters
(SPF) gathers professional forecasters’ predictions about many economic variables.6 Since 1999, SPF has gathered predictions on the euro
area inflation rate using
the change in the Harmonised Index of Consumer
Prices (HICP) for the prices of
consumer goods and services acquired by households to measure inflation. If these
forecasts of inflation could perfectly predict actual inflation, the correlation between
forecasts and inflation would be 1. Figure 7
shows a scatter plot of the mean forecast made in the first quarter of a year for the percentage change in
HICP during that year and the actual percentage
change in HICP, from 1999 through 2013.7 In this scatter
plot, the forecast for each year
is plotted on the x-axis and the actual change
in the HICP
is plotted on the y-axis. Figure 7 Actual Change in Euro Area HICP versus Predicted an Source: European Central Bank. Discuss whether professional
forecasters’ predictions of the euro area inflation might be useful in
investment decision-making. Solution: As Figure 7 shows, a
fairly strong linear association exists between the forecast and the actual
inflation rate, suggesting that professional forecasts of inflation might be
useful in investment decision-making. In fact, the correlation between the
two series is 0.8913. Although there is no causal relation here, there is a
direct relation because forecasters assimilate information to forecast inflation. |
One important issue in evaluating a
portfolio manager’s performance is determining
an appropriate benchmark for the
manager. Since the early 1990s, style analysis has been an important
component of benchmark
selection.8
8
8 See, for example, Sharpe
(1992), Buetow, Johnson, and Runkle (2000), and Chan, Dimmock, and Lakonishok (2009).
|
EXAMPLE 2 |
Style Analysis CorrelationsPortfolio managers
using small-cap stocks in investment portfolios may favor a growth style, a
value style, or neither. In the United
States, the Russell 2000 Growth Index and the Russell 2000 Value Index are
often used as benchmarks for small-cap growth and small-cap value managers,
respectively. Correlation analysis shows, however, that the returns to these
two indexes are very closely associated with each other. For the 15 years
ending in 2013 (January 1999 to December 2013), the correlation between the monthly
returns to the Russell 2000 Growth Index and the Russell 2000 Value Index was
0.8249. What conclusions
can be drawn based on this result concerning the mean returns to small-cap
growth and small-cap value investment styles? Explain your answer. Solution: The returns to the
two indexes are highly positively correlated. But correlation does not
provide information on variables’ mean returns, only on how their returns covary. Here, for example, even a correlation of +1 to the
returns to the two styles would not imply that the mean returns to the two
styles are the same. Thus, the information given is not sufficient to reach a
conclusion on the relative mean returns to the small-cap growth and small-cap
investment styles. |
The
previous examples in this reading have examined the correlation between two variables.
Often, however, investment managers need to understand the correlations among many asset returns. For example, investors
who have any exposure to movements in exchange rates must understand the
correlations of the returns to different foreign currencies and other assets
in order to determine their optimal portfolios and hedging strategies.9 In the following
example, we see how a correlation matrix shows correlation between pairs of variables when we have more than two variables. We also see one of the main challenges to investment managers:
Investment return correlations can change substantially over time.
|
EXAMPLE 3 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Exchange Rate Return CorrelationsThe exchange rate
return measures the periodic domestic currency return to holding foreign
currency. Consider a British investor with British pounds (GBP) as her
domestic currency. Suppose a change in inflation rates in Canada and the
United Kingdom results in the pound price of a Canadian dollar changing from
£0.50 to £0.45. If this change occurred in one month, the return in that month
to holding Canadian dollars would be (0.45 – 0.50)/0.50 = –10 percent, in
terms of pounds. Table 3 shows a
correlation matrix of monthly returns in British pounds to holding Canadian,
Japanese, Swedish, or US currencies during two seven-year periods of
2000–2006 and 2007–2013. To interpret a correlation matrix, we first examine
the top panel of this table. The first column of
numbers of that panel shows the correlations between GBP returns to holding
the Canadian dollar and GBP returns to holding Canadian, Japanese, Swedish,
and US currencies during 2000–2006. Of course, any variable is perfectly
correlated with itself, and so the correlation between GBP returns to holding
the Canadian dollar and GBP returns to holding the Canadian dollar is 1. The
second row of this column shows that the correlation between GBP returns to holding
the Canadian dollar and GBP returns to holding the Japanese yen was 0.4552
during 2000–2006. The remaining correlations in the panel show how the GBP
returns to other combinations of currency holdings were correlated during this
period.
1 Explain why Table 3 omits many of
the correlations. Solution to 1: The formula for
correlation coefficient in the earlier equation (equation 2) shows that
correlations are always symmetrical: The correlation between X and Y is always the same as the correlation between Y and X. Accordingly, duplicative coefficients are excluded in Table 3.
For example, Column 2 of the panels omits the correlation between GBP returns
to holding yen and GBP returns to holding Canadian dollars. This correlation
is omitted because it is identical to the correlation between GBP returns to
holding Canadian dollars and GBP returns to holding yen shown in Row 2 of
Column 1. Similarly, other omitted correlations would also have been duplicative. 2 Compare the two panels of Table
3 and discuss whether the changes in correlations from 2000–2006 to 2007–2013
show a pattern. Solution to 2: A comparison of
the two panels of Table 3 shows that that many of the currency return
correlations changed dramatically between the periods of 2000–2006 and 2007–2013,
but there is no pattern in these changes. During 2000–2006, for example, the correlation
between the return to holding Canadian dollars and the return to holding Japanese
yen (0.4552) was about the same as the correlation between the return to holding
yen and the return to holding US dollars (0.4360). During 2007–2013, however,
the correlation between Canadian dollar returns and yen returns dropped
substantially (to 0.3091), but the correlation between yen and US dollar
returns increased substantially (to 0.7230). Some other correlations also increased
markedly. For example, the correlation between Canadian dollar returns and
Swedish krona returns almost doubled from 0.2686 to 0.5278 and the
correlation between krona and US dollar returns increased from 0.0074 to 0.1862.
In contrast, the correlation between Canadian and US dollars decreased from
0.6917 to 0.5263 and the correlation between yen and krona returns hardly changed
(0.1832 to 0.1742). Optimal asset
allocation depends on expectations of future correlations. With less than
perfect positive correlation between two assets’ returns, there are potential
risk-reduction benefits to holding both assets. Expectations of future correlation
may be based on historical sample correlations, but the variability in
historical sample correlations poses challenges. We discuss these issues in detail
in the reading on portfolio concepts. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9
9
See, for example,
Campbell, Medeiros, and Viceira (2009).
In the next example, we extend the discussion of the correlations of stock market
indexes begun in Example 2 to indexes
representing large-cap, small-cap, and broad- market returns. This type of analysis has
serious diversification and asset allocation
consequences because the strength of the correlations among the assets
tells us how successfully the assets can be combined to diversify
risk.
|
EXAMPLE 4 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Correlations among Stock Return SeriesTable 4 shows the
correlation matrix of monthly returns to three UK stock indexes during the
period January 1990 to December 2009 and in two subperiods
(the 1990s and 2000s). The large-cap style is represented by the return to
the FTSE 100 Index, the small-cap style is represented by the return to the
FTSE Small Cap Excluding Investment Companies Index, and the broad-market returns
are represented by the return to the FTSE All-Share Index.
Discuss whether
the correlation coefficients for the entire sample are consistent with your expectations. Solution: The first column
of numbers in the top panel of Table 4 shows nearly perfect positive
correlation between returns to the FTSE 100 and returns to the FTSE All-Share:
The correlation between the two return series is 0.9906. This result should
not be surprising, because both the FTSE 100 and the FTSE All-Share are
value-weighted indexes, and large-cap stock returns receive most of the weight
in both indexes. In fact, the companies that make up the FTSE 100 have more than
80 percent of the total market value of all companies included in the FTSE All-Share. Small-cap stocks
also have a reasonably high correlation with large stocks. In the total sample,
the correlation between the FTSE 100 returns and the FTSE Small-Cap returns is
0.6914. The correlation between FTSE Small-Cap returns and returns to the FTSE
All-Share is slightly higher (0.7694). This result is also not too surprising
because the FTSE All-Share contains small-cap stocks and the FTSE 100 does not. The second and
third panels of Table 4 show that correlations among the various stock market
return series show some variation from decade to decade. For example, the correlation
between returns to the FTSE 100 and FTSE small- cap stocks increased from 0.6553
in the 1990s to 0.7245 in the 2000s.10 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
For
asset allocation purposes, correlations among asset classes are studied care- fully with a view toward
maintaining appropriate diversification based on forecasted correlations.
|
EXAMPLE 5 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Correlations of Debt and Equity ReturnsTable 5 shows the
correlation matrix for various US debt returns and US large and small company
stock returns using monthly data from January 1926 to December 2012.
The first column
of numbers, in particular, shows the correlations of US large company stock
returns with small company stock returns and various debt returns. As
expected, large and small company stocks have a high correlation (0.79). In
contrast, large company stock returns are almost completely uncorrelated
(−0.01) with Treasury bill returns for this period. Long-term corporate
debt returns are somewhat more correlated (0.16) with large company stock
returns. Long-term
government bonds, however, have a very low correlation (0.01) with large
company stock returns. We expect some correlation between these variables
because interest rate increases reduce the present value of future cash flows
for both bonds and stocks. The low correlation between these two return
series, however, shows that other factors affect the returns on stocks
besides interest rates. Without these other factors, the correlation between
bond and stock returns would be higher. The third column
of numbers in Table 5 shows that the correlation between long-term government
bond and corporate bond returns is quite high (0.89) for this time period.
Although this correlation is the highest in the entire matrix, it is not 1.
The correlation is less than 1 because the default premium for long-term
corporate bonds changes, whereas US government bonds do not incorporate a
default premium. As a result, changes in required yields for government bonds
have a correlation less than 1 with changes in required yields for corporate
bonds, and return correlations between government bonds and corporate bonds
are also below 1. Note also that T-bill returns have a very low correlation
with all other return series. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10
10 The correlation coefficient for the 1990s was not significantly
different from that for the 1980s at
the 0.10
significance level. A test for this type of hypothesis
on the correlation coefficient can be conducted
using Fisher’s z-transformation. See Daniel and Terrell (1995) for information on this method.
In the next example,
correlation is used in a financial statement
setting to show that
net income is an inadequate proxy for cash flow.
|
EXAMPLE 6 |
||||||||||||||||||||||||
Correlations among Net Income, Cash Flow from Operations, and Free Cash Flow to the FirmNet income (NI),
cash flow from operations (CFO), and free cash flow to the firm (FCFF) are
three measures of company performance that analysts often use to value
companies. Differences in these measures for given companies would not cause
differences in the relative valuation if the measures were highly correlated.
CFO equals net income plus the net noncash charges that were subtracted to
obtain net income, minus the company’s investment in working capital during
the same time period. FCFF equals CFO plus net-of-tax interest expense, minus
the company’s investment in fixed capital over the time period. FCFF may be
interpreted as the cash flow available to the company’s suppliers of capital
(debtholders and shareholders) after all operating
expenses have been paid and necessary investments in working and fixed
capital have been made.11 Some analysts base
their valuations only on NI, ignoring CFO and FCFF. If the correlations among
NI, CFO, and FCFF were very high, then an analyst’s decision to ignore CFO
and FCFF would be easy to understand because NI would then appear to capture
everything one needs to know about cash flow. Table 6 shows the
correlations among NI, CFO, and FCFF for a group of six publicly traded US
companies involved in retailing women’s clothing for 2001. Before computing
the correlations, we normalized all of the data by dividing each company’s
three performance measures by the company’s revenue for the year.12
Because CFO and
FCFF include NI as a component (in the sense that CFO and FCFF can be
obtained by adding and subtracting various quantities from NI), we might
expect that the correlations between NI and CFO and between NI and FCFF would
be positive. Table 6 supports that conclusion. These correlations with NI,
however, are much smaller than the correlation between CFO and FCFF (0.8217).
The lowest correlation in the table is between NI and FCFF (0.4045). This
relatively low correlation shows that NI contained some but far from all the
information in FCFF for these companies in 2001. Later in this reading, we
will test whether the correlation between NI and FCFF is significantly
different from zero. |
||||||||||||||||||||||||
11
12 The results in this table are based on data for all women’s
clothing stores (US Occupational Health
and Safety Administration Standard Industrial Classification 5621) with a
market capitalization of more than $250 million at the end of 2001. The market-cap criterion
was used to eliminate microcap firms, whose performance-measure correlations may be different from those
of higher-valued firms.
The
final example in this section introduces a growing area of activity in
uncovering relationships among variables.
Testing the Significance of the Correlation Coefficient
Significance tests allow us to assess whether apparent
relationships between random variables
are the result of chance. If we decide that the relationships do not result from chance, we will be inclined to use
this information in predictions because a good
prediction of one variable will help
us predict the other variable. Using the data in Table 2, we calculated 0.9083 as the sample correlation between
long-term money growth and
long-term inflation in six industrialized countries between 1980 and 2012. That estimated correlation seems high, but is it significantly different from 0? Before we can answer
this question, we must know some details
about the distribution of the underlying variables themselves. For
purposes of simplicity, let us assume that both of the variables are
normally distributed.13
We
propose two hypotheses: the null hypothesis, H0,
that the correlation in the population
is 0 (ρ = 0); and the alternative hypothesis, Ha,
that the correlation in the population
is different from 0 (ρ ≠ 0).
The alternative hypothesis is a test that the correlation is not equal
to 0; therefore, a two-tailed test is appropriate. As long as the two variables are distributed normally, we can test to determine whether the null hypothesis should be
rejected using the sample
correlation, r. The formula for the t-test is
$$ t = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}} $$ (3)
This test statistic has a t-distribution with n − 2 degrees of freedom if the null hypoth- esis is true.
One practical observation concerning Equation 3 is that the magnitude of r needed to reject the
null hypothesis H0: ρ = 0 decreases as sample size n increases, for two reasons. First, as n increases, the
number of degrees of freedom increases and
the absolute value of the critical value tc decreases. Second,
the absolute value of the numerator
increases with larger n,
resulting in larger-magnitude t-values.
For example, with sample size n = 12, r = 0.58 results in a t-statistic of 2.252
that is just significant at the 0.05 level (tc = 2.228). With a sample
size n = 32, a smaller sample
13 Actually, we must assume that the variables come from a bivariate normal distribution. If two variables,
X and Y, come from a bivariate normal distribution, then for each value of X the distribution of Y is normal.
See, for example,
Ross (2012) or Greene (2011).
correlation r = 0.35 yields a t-statistic of 2.046 that is just significant at the 0.05 level (tc = 2.042);
the r = 0.35 would not be significant with a sample
size of 12 even at the
0.10 significance level. Another way to make this point is that sampling from
the same population, a false null
hypothesis H0: ρ = 0 is more
likely to be rejected as we increase sample size, all else equal.
|
EXAMPLE 8 |
Testing the Correlation between Money Supply Growth and InflationEarlier
in this reading, we showed that the sample correlation between long-term money
supply growth and long-term inflation in six industrialized countries was 0.9083
during the 1980–2012 period. Suppose we want to test
the null hypothesis, H0,
that the true correlation in the population is 0 (ρ = 0) against the alternative hypothesis,
Ha, that the correlation in the population
is different from 0 (ρ ≠
0). 1
Calculate the test
statistic to test the null hypothesis given above. 2
Determine whether the
null hypothesis is rejected or not rejected at the 0.05
level of significance. Solution to 1: Recalling
that this sample has six observations, we can compute the statistic for testing
the null hypothesis as follows: $$ t=\frac{0.9083\sqrt{6-2}}{\sqrt{1-0.9083^2}}=4.343 $$ The
value of the test statistic is 4.343. Solution to 2: As
the table of critical values of the t-distribution
for a two-tailed test shows, for a t-distribution
with n − 2 = 6 − 2 = 4 degrees
of freedom at the 0.05 level of significance, we can reject the null
hypothesis (that the population correlation is equal to 0) if the value of the
test statistic is greater than 2.776 or less than −2.776. The fact that
we can reject the null hypothesis of no correlation based on only six
observations is quite unusual; it further demonstrates the strong relation
between long-term money supply growth and long-term inflation in these six countries. |
|
EXAMPLE 9 |
Testing the Yen–Canadian Dollar Return CorrelationThe
data in Table 3 showed that the sample correlation between the GBP monthly returns
to Japanese yen and Canadian dollar was 0.3091 for the period from January
2007 through December 2013. Can
we reject a null hypothesis that the underlying or population correlation equals
0 at the 0.05 level of significance? Solution: With
84 months from January 2007 through December 2013, we use the fol- lowing statistic to test the null hypothesis, H0, that the true
correlation in the population is 0, against the alternative hypothesis, Ha, that the correlation in the
population is different from 0: $$ t=\frac{ 0.3091\sqrt{84-2}}{\sqrt{1- 0.3091^2}}=2.9431
$$ At
the 0.05 significance level, the critical level for this test statistic is 1.99
(n= 84, degrees of freedom = 82). When
the test statistic is either larger than 1.99 or smaller than −1.99, we
can reject the hypothesis that the correlation in the population is 0. The test
statistic is 2.9431, so we can reject the null hypothesis. Note that the sample
correlation coefficient in this case is significantly different from 0 at the
0.05 level, even though the coefficient is much smaller than that in the previous
example. The correlation coefficient, though smaller, is still significant because
the sample is much larger (84 observations instead of 6 observations). |
The above example
shows the importance of sample size in tests of the significance of the
correlation coefficient. The following example also shows the importance of sample size and examines the relationship at
the 0.01 level of significance as well as at
the 0.05 level.
|
EXAMPLE 10 |
The Correlation between Bond Returns and T-Bill ReturnsTable
5 showed that the sample correlation between monthly returns to US long-term
government bonds and monthly returns to T-bills was 0.18 from January 1926
through December 2012. Can
we reject a null hypothesis that the underlying or population correlation coefficient
equals 0 at the 0.05 and 0.01 levels of significance? Solution: There
are 1,044 months during the period January 1926 to December 2012. Therefore,
to test the null hypothesis, H0
(that the true correlation in the population is 0), against the alternative
hypothesis, Ha (that the
correlation in the population is different from 0), we use the following test
statistic: $$ t=\frac{ 0.18\sqrt{1.044-2}}{\sqrt{1- 0.18^2}}=5.9069
$$ At
the 0.05 significance level, the critical value for the test statistic is
approximately 1.96. At the 0.01 significance level, the critical value for the
test statistic is approximately 2.58. The test statistic is 5.9069, so we can
reject the null hypothesis of no correlation in the population at both the
0.05 and 0.01 levels. This example shows that, in large samples, even
relatively small correlation coefficients can be significantly different from
zero. |
In the final example of this section,
we explore another
situation of small sample size.
|
EXAMPLE 11 |
Testing the Correlation between Net Income and Free Cash Flow to the FirmEarlier
in this reading, we showed that the sample correlation between NI and FCFF for
six women’s clothing stores was 0.4045 in 2001. Suppose we want to test the
null hypothesis, H0,
that the true correlation in the population is 0 (ρ = 0) against the
alternative hypothesis, Ha, that
the correlation in the population is different from 0 (ρ ≠ 0).
Recalling that this sample has six observations, we can compute the statistic
for testing the null hypothesis as follows: $$ t=\frac{ 0.4045\sqrt{6-2}}{\sqrt{1- 0.4045^2}}=0.8846
$$ With
n − 2 = 6 − 2 = 4
degrees of freedom and a 0.05 significance level, we reject the null hypothesis
that the population correlation equals 0 for values of the test statistic greater
than 2.776 or less than −2.776. In this case, however, the t-statistic is 0.8846, so we cannot reject
the null hypothesis. Therefore, for this sample of women’s clothing stores,
there is no statistically
significant correlation
between NI and FCFF, when each is normalized by dividing by sales for the company.14 |
The scatter
plot creates a visual picture
of the relationship between two variables, while the correlation coefficient
quantifies the existence of any linear relationship. Large absolute values of the correlation coefficient indicate
strong linear relation- ships.
Positive coefficients indicate a positive relationship and negative
coefficients indicate a negative relationship between two data sets. In Examples 9 and 10, we saw that
relatively small sample correlation coefficients (0.3091 and 0.18,
respectively) can be statistically significant and thus might
provide valuable information about the behavior of economic variables.
Next
we will introduce linear regression, another tool useful in examining the relationship between two variables.
LINEAR REGRESSION
Linear
regression with one independent variable, sometimes called simple linear regression, models the
relationship between two variables as a straight line. When the linear relationship between
the two variables is significant, linear regression provides
a simple model for forecasting the
value of one variable, known as the dependent
variable, given the value of the second variable, known as the
independent variable. The following sections explain linear
regression in more detail.
14 It is worth repeating
that the smaller the sample, the greater
the evidence in terms of the magnitude
of the sample correlation needed to reject the null hypothesis of zero correlation. With a sample size of 6,
the absolute value of the sample correlation would need to be greater than 0.81
(carrying two decimal places) for us
to reject the null hypothesis. Viewed another way, the value of 0.4045 in the
text would be significant if the sample
size were 24, because 0.4045(24 − 2)1/2/(1 − 0.40452)1/2 = 2.075, which is greater
than the critical
t-value of 2.074 at the 0.05 significance level with 22 degrees of freedom.
Linear Regression with One Independent Variable
As a financial analyst, you will often want to understand
the relationship between
financial or economic variables, or to predict
the value of one variable
using information about the value of another variable.
For example, you may want to know the impact of changes in the 10-year Treasury
bond yield on the earnings
yield of the S&P 500 (the
earnings yield is the reciprocal of the price-to-earnings ratio). If the
relationship between those two variables is linear, you can use linear
regression to summarize it. Linear regression allows us to use one
variable to make predictions about another, test hypotheses
about the relation between two variables, and quantify the strength of the relationship between the two variables. The remainder of this reading focuses on linear regression with a single independent variable.
In the next reading, we will examine regression with more than one independent variable.
Regression analysis
begins with the dependent variable
(denoted Y), the variable that you
are seeking to explain. The independent variable (denoted X) is the variable
you are using to explain changes in the dependent
variable. For example,
you might try to explain small-stock returns (the dependent
variable) based on returns to the S&P
500 (the independent variable). Or you might try to explain inflation
(the dependent variable) as a function
of growth in a country’s money supply (the independent variable).
Linear regression assumes a linear relationship between the dependent
and the independent variables. The following regression
equation describes that relation:
$$ Y_i = b_0 + b_1 X_i + \epsilon _i \ , \ i = 1, …, n $$ (4)
This equation
states that the dependent variable, Y, is equal to the intercept, b0, plus a slope
coefficient, b1, times the independent variable, X, plus an error
term,
ε. The error term represents the portion of the dependent variable that cannot
be explained by the independent variable. We refer
to the intercept b0 and the slope coefficient b1 as the regression
coefficients.
Regression
analysis uses two principal types of data: cross-sectional and time series.
Cross-sectional data involve
many observations on X and Y for the same time period.
Those observations could come from different companies, asset classes, investment funds, people, countries, or
other entities, depending on the regression model.
For example, a cross-sectional model might use data from many companies to test whether predicted
earnings-per-share growth explains differences in price-to- earnings
ratios (P/Es) during
a specific time period. The word “explain”
is frequently used
in describing regression relationships. One estimate of a company’s P/E that
does not depend on any other variable is the average P/E. If a regression of a
P/E on an independent variable
tends to give more accurate
estimates of P/E than just assuming that the company’s P/E equals the average
P/E, we say that the independent variable helps
explain P/Es because using that
independent variable improves our estimates.
Finally, note that if we use cross-sectional observations in a
regression, we usually denote the
observations as i = 1, 2, …, n.
Time-series
data use many observations from different time periods for the same company, asset class, investment fund,
person, country, or other entity, depending on
the regression model. For example, a time-series model might use monthly data from many
years to test whether US inflation rates determine US short-term interest rates.15 If we use time-series data in a regression, we usually denote
the observations
as
t =
1, 2, …, T.16
16
In this reading,
we primarily use the notation
i = 1, 2, …, n even for time series to prevent confusion
that would be caused by switching back and forth between different
notations.
Exactly
how does linear regression estimate b0 and b1? Linear regression, also known as linear least squares, computes
a line that best fits the observations; it chooses values for the intercept, b0,
and slope, b1,
that minimize the sum of the squared vertical
distances between the observations and the regression line. Linear regression chooses the estimated parameters or fitted parameters $\hat{b}_0$ and $\hat{b}_1$ in Equation
4 to minimize17
$$ \sum^n_{i=1}\left(Y_i - \hat{b}_0 -\hat{b}_1 X_i
\right ) ^2 $$ (5)
In this equation, the term
$\left(Y_i - \hat{b}_0-\hat{b}_1 X_i \right ) ^2 $ means (dependent variable – predicted value of dependent variable)2. Using this method to estimate the values of $\hat{b}_0$
and $\hat{b}_1$ we can fit a line through the observations on X and Y that best explains
the value that Y takes for
any particular value of X.18
Note that we never observe the population parameter values
b0 and b1 in a regression model. Instead, we observe only b0
and b1, which are estimates
of the population parameter values. Thus predictions must be based on the
parameters’ estimated values, and testing is based on estimated values in
relation to the hypothesized population values.
Figure 8 gives a visual example of how linear regression
works. The figure shows the linear regression that results from estimating the
regression relation between the annual rate of inflation (the dependent
variable) and annual rate of money supply growth (the independent variable) for
six industrialized countries from 1980 to 2012 (n = 6).19
The equation to be estimated is Long-term rate of inflation = b0 + b1 (Long-term
rate of money supply growth) + ε.
17 Hats over the symbols
for coefficients indicate
estimated values.
18 For a discussion of the precise statistical sense in which the estimates
of b0 and b1 are optimal,
see Greene (2011).
17
19 These data appear in Table 2.
Figure 8 Fitted Regression
Line Explaining the Inflation Rate Using Growth
in the Money Supply by Country, 1980–2012
Source:
International Monetary
Fund.
The
distance from each of the six data points to the fitted regression line is the regression residual, which is the difference between the actual value of the dependent variable and the predicted value of the dependent variable
made by the regression equation. Linear
regression chooses the estimated coefficients $\hat{b}_0$ and $\hat{b}_1$
in Equation 4 such that the sum of the squared vertical distances is
minimized. The estimated regression
equation is Long-term inflation = –0.0003 + 0.3327 (Long-term money supply growth).20
According to this regression equation, if the long-term money supply growth is 0 for any particular country, the long-term
rate of inflation in that country will be –0.03 percent. For every
1-percentage-point increase in the long-term rate of money supply growth for a country, the
long-term inflation rate is predicted to increase by 0.3327 percentage points. In a regression such as this one,
which contains one independent variable, the slope coefficient equals Cov(Y,X)/Var(X). We can solve for the slope coefficient
using data from Table 2,
excerpted here:
|
Table 2 |
(excerpted) |
||||
|
Country |
Money Supply Growth
Rate Xi |
Inflation Rate Yi |
Cross-Product $$ (X_i- \bar{X})(Y_i -\bar{Y})) $$ |
Squared Deviations $$ (X_i- \bar{X})^2 $$ |
Squared Deviations $$ (Y_i- \bar{Y})^2 $$ |
|
Sum |
0.5837 |
0.1921 |
0.004485 |
0.013481 |
0.001809 |
|
Average |
0.0973 |
0.0320 |
|
|
|
|
Covariance |
|
|
0.000897 |
|
|
|
Variance |
|
|
|
0.002696 |
0.000362 |
|
Standard deviation |
|
|
|
0.051926 |
0.019019 |
|
$Cov(Y,X)=0.000897$ $Var(X)=0.002696$ $\frac{Cov(Y,X)}{Var(X)}=\frac{0.000897}{0.002696}$ $\hat{b}_1=0.3327$ |
|||||
In a linear regression, the regression line fits through
the point corresponding to the means of the dependent and the independent variables. As
shown in Table 1 (excerpted below),
from 1980 to 2012, the mean long-term growth rate of the money
supply for these six countries
was 9.73 percent,
whereas the mean long-term inflation
rate was 3.20 percent.
|
|
Table 1 (excerpted) |
||
|
|
Money Supply Growth Country |
Rate (%) |
Inflation Rate (%) |
|
Average |
9.73 |
3.20 |
|
Because the point (9.73, 3.20) lies on the regression line $\hat{b}_0=\bar{Y}-\hat{b_1}\bar{X}$
, we can solve for the intercept using this point as follows:
$$ \hat{b}_0=
- 0.0320-0.3327(0.0973)=-0.0003 $$
We are showing how to solve the linear regression equation
step by step to make the source of the numbers clear. Typically, an analyst will
use the data analysis function on a spreadsheet or a statistical package to
perform linear regression analysis. Later, we will discuss how to use regression
residuals to quantify the uncertainty in a regression model.
Assumptions of the Linear Regression Model
We have discussed how
to interpret the coefficients in a linear regression model. Now we turn to the statistical assumptions underlying this model. Suppose that we have n observations on both the dependent variable, Y, and the independent variable, X, and we want to
estimate Equation 4:
$$ Y_i
= b_0 + b_1 X_i + \epsilon _i \ , \ i = 1, …, n $$
To
be able to draw valid conclusions from a linear regression model with a single independent variable, we need to make the
following six assumptions, known as the classic
normal linear regression model assumptions:
1
The relationship between the dependent variable,
Y, and the independent variable, X is linear in the parameters b0 and
b1. This requirement means that b0 and b1 are
raised to the first power only and that neither b0 nor b1
is multiplied or divided by another regression parameter (as in b0/b1,
for example). The requirement does not exclude X from being raised to a power other
than 1.
2
The independent variable, X, is not random.21
3
The expected value of the error term is 0: E(ε) = 0.
4
The variance of the error term is the same for all observations: E(e2 ) = s2 , i =1,…,n.
5
The error term,
ε, is uncorrelated across observations. Consequently, E(εiεj)
= 0 for all i not equal to j.22
6
The error term, ε, is normally distributed.23
Now
we can take a closer look at each of these assumptions.
Assumption 1 is
critical for a valid linear regression. If the relationship between the
independent and dependent variables is nonlinear in the parameters, then
estimating that relation with a linear regression model will produce invalid
results. For example, $Y_i = b_0 e^{b_1 X_i}+\epsilon _i$ is nonlinear in b1, so we could not
apply the linear regression model to it.24
Even if the dependent variable is nonlinear, linear
regression can be used as long as the regression is linear in the parameters. So, for example,
linear regression can be used to estimate the equation Yi = b0 + b1 X 2 + ei .
Assumptions 2 and 3 ensure that linear regression
produces the correct estimates of $\hat{b}_0$
and $\hat{b}_1$ .
Assumptions 4, 5, and 6 let us use the linear regression
model to determine the distribution
of
the
estimated
parameters $\hat{b}_0$ and
$\hat{b}_1$ and
thus
test
whether
those
coefficients
have a particular value.
§
Assumption 4, that the variance
of the error term is the same for all observations, is also known as the homoskedasticity assumption. The reading on regression
analysis discusses how to
test for and correct violations of this assumption.
§
Assumption 5, that the errors are uncorrelated across observations, is also necessary for correctly estimating the variances of the estimated parameters b‸ 0 and b‸1. The reading on multiple regression discusses violations of this assumption.
21 Although we assume that the
independent variable in the regression model is not random, that assumption is clearly often not true. For example,
it is unrealistic to assume
that the monthly
returns to the S&P
500 are not random. If the independent variable is random, then is the
regression model incorrect? Fortunately, no. Econometricians have shown that even if the independent variable is random,
we can still rely on the
results of regression models given the crucial assumption that the error term
is uncorrelated with the independent
variable. The mathematics underlying this reliability demonstration, however, are quite difficult. See, for example,
Greene (2011) or Goldberger (1998).
22
Var(εi) = E[εi − E(εi)]2 = E(εi − 0)2 = E(εi)2. Cov(εi, εj) = E{[εi − E(εi)][εj − E(εj)]} = E[(εi − 0) (εj − 0)]
= E(εiεj) = 0.
23 If the regression errors are not normally distributed, we can still use regression analysis. Econometricians who
dispense with the normality assumption
use chi-square tests of hypotheses
rather than F-tests. This difference usually does not affect whether
the test will result in a particular null hypothesis being rejected.
23
24 For more information on nonlinearity in the parameters, see Gujarati and Porter (2008).
§
Assumption 6, that the error term is normally distributed, allows us to easily test a particular
hypothesis about a linear regression
model.25
|
EXAMPLE 12 |
Evaluating Economic Forecasts (2)If
economic forecasts were completely accurate, every prediction of change in an
economic variable in a quarter would exactly match the actual change that
occurs in that quarter. Even though forecasts can be inaccurate, we hope at
least that they are unbiased—that is, that the expected value of the forecast
error is zero. An unbiased forecast can be expressed as E(Actual
change – Predicted change) = 0. In fact, most evaluations of forecast
accuracy test whether forecasts are unbiased.26 Figure
9 repeats Figure 7 in showing a scatter plot of the mean forecast made in the
first quarter of a year for the percentage change in HICP during that year
and the actual percentage change in HICP, from 1999 through 2013, but it adds
the fitted regression line for the equation Actual percentage change = b0
+ b1 (Predicted percentage change) + ε. If the forecasts are
unbiased, the intercept, b0, should be 0 and the slope, b1,
should be 1. We should also find E(Actual
change – Predicted change) = 0. If forecasts are actually unbiased, as long
as b0 = 0 and b1 = 1, the error term [Actual change
− b0 − b1(Predicted
change)] will have an expected value of 0, as required by Assumption 3 of the
linear regression model. With unbiased forecasts, any other values of b0
and b1 would yield an error term with an expected value different
from 0. Figure 9 Actual
Change in Euro Area HICP versus Predicted Change Source: European Central
Bank. If
b0 = 0 and b1 = 1, our best guess of actual change in HICP
would be 0 if professional forecasters’ predictions of change in HICP were 0.
For every 1-percentage-point increase in the prediction of change by the
professional forecasters, the regression model would predict a
1-percentage-point increase in actual change. The
fitted regression line in Figure 9 comes from the equation Actual change = −0.7006
+ 1.5538(Predicted change). It seems that the estimated values of b0 and b1
are not particularly close to the values b0 = 0 and b1
= 1 that are consistent with unbiased forecasts. Later in this reading, we
discuss how to test the hypotheses that b0 = 0 and b1=
1. |
The Standard Error of Estimate
The
linear regression model sometimes describes
the relationship between two variables quite well, but sometimes
it does not. We must be able to distinguish between these two cases in order to use regression
analysis effectively. Therefore, in this section
and the next, we discuss
statistics that measure how well a given linear regression model captures the relationship between
the dependent and independent variables. Figure 9, for example, shows a strong relation between predicted inflation
and actual inflation. If we knew professional forecasters’ predictions for inflation in a particular quarter, we would be reasonably
certain that we could use this regression model to forecast actual inflation relatively accurately.
In
other cases, however, the relation between the dependent and independent variables is not strong. Figure 10 adds a
fitted regression line to the data on inflation and stock returns during 1990 to 2013 from Figure 6. In this
figure, the actual observations are generally much farther from the fitted
regression line than in Figure 9. Using the estimated regression
equation to predict monthly stock returns assuming
a particular level of inflation
might result in an inaccurate forecast.
As noted, the regression relation
in Figure 10 is less precise than that in Figure 9. The standard
error of estimate
(sometimes called the standard error of the regression) measures
this uncertainty. This statistic is very much like the standard deviation
for a single variable, except that it measures the standard deviation of $\hat{\epsilon _i}$ the residual term in the regression.
The formula for the standard
error of estimate
(SEE) for a linear regression model with one independent variable is
$$ SEE=\left(\frac{\sum^n_{i=1}\left(Y_i -
\hat{b}_0-\hat{b}_1 X_i \right)^2}{n-2}\right)^{\frac{1}{2}}=\left(\frac{\sum^n_{i=1}\left(\hat{\epsilon
_i} \right)^2}{n-2}\right)^{\frac{1}{2}} $$ (6)
In the numerator of this equation,
we are computing the difference between the dependent variable’s actual value for each observation and its predicted value $\hat{b}_0-\hat{b}_1 X_i $ for each
observation. The difference between the actual and predicted values of the dependent
variable is the regression residual, $\hat{\epsilon _i}$.
Equation 6 looks very much like the formula
for computing a standard deviation, except that n − 2 appears in the denominator instead of n −
1. We use n − 2 because the sample
includes n observations and the linear regression model estimates two parameters ($\hat{b}_0$ and
$\hat{b}_1$); the difference between the number of observations and the number
of parameters is n − 2. This difference is also called the degrees
of freedom; it is the denominator needed to ensure that the estimated standard
error of estimate
is unbiased.
Sources: Bureau of Labor Statistics
and S&P Dow Jones Indices.
|
EXAMPLE 13 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computing the Standard Error of EstimateRecall that the
estimated regression equation for the inflation and money supply growth data
shown in Figure 8 was Yi = –0.0003
+ 0.3327Xi. Table 7 uses this estimated
equation to compute the data needed for the standard error of estimate.
The first and
second columns of numbers in Table 7 show the long-term money supply growth
rates, Xi, and long-term inflations
rates, Yi, for the six countries. The
third column of numbers shows the predicted value of the dependent variable
from the fitted regression equation for each observation. For the United
States, for example, the predicted value of long-term inflation is –0.0003 + 0.3327(0.0653)
= 0.0214 or 2.14 percent. The next-to-last column contains the regression residual,
which is the difference between the actual value of the dependent variable, Yi’ , and the predicted value of the dependent
variable, $\left( \hat{Y_i} = \hat{b}_0 +
\hat{b_1}X_i \right)$ So for the United States,
the residual is equal to 0.0293– 0.0214 = 0.0079 or 0.79 percent. The last
column contains the squared regression residual. The sum of the squared
residuals is 0.000316. Applying the formula for the standard error of
estimate, we obtain $$ \left(\frac{0.000316}{6-2}\right)\frac{1}{2}=0.008895 $$ Thus the standard error
of estimate is about 0.89 percent. Later, we will combine
this estimate with estimates of the uncertainty about the parameters in this regression
to determine confidence intervals for predicting inflation rates from money
supply growth. We will see that smaller standard errors result in more accurate
predictions. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
The Coefficient of Determination
Although
the standard error of estimate gives some indication of how certain we can be about a particular prediction of Y using the regression equation, it still does not tell us how well the
independent variable explains variation in the dependent variable. The coefficient of determination does
exactly this: It measures the fraction of the total variation in the dependent
variable that is explained by the independent variable.
We can compute the
coefficient of determination in two ways. The simpler method, which
can be used in a linear regression with one independent variable, is to square the correlation coefficient between the
dependent and independent variables. For example,
recall that the correlation coefficient between the long-term rate of money growth and the long-term rate of
inflation between 1980 and 2012 for six industrialized countries was 0.9083. Thus the coefficient of determination in the regression shown in Figure 8 is (0.9083)2 = 0.8250. So in this regression, the long-term rate of money
supply
growth explains approximately 82.5 percent of the variation in the long-term rate of inflation across the countries
between 1980 and 2012. (Relatedly, note that
the square root of the coefficient of determination in a
one-independent-variable linear regression, after attaching the sign of the estimated
slope coefficient, gives the correlation coefficient between the dependent and independent variables.)
The problem with this method is that it cannot be used
when we have more than one
independent variable.27 Therefore, we need an alternative method of computing the coefficient of determination for
multiple independent variables. We now present
the logic behind that alternative.
If we did not know the regression relationship, our best guess for the value of any particular observation of the dependent variable would simply
be $\bar{Y}$, the mean of the dependent variable. One measure
of accuracy in predicting
Yi based on $\bar{Y}$ is the sample variance of Yi,
$\sum_{i=1}^n \frac{(Y_i
-\bar{Y})^2}{n-1}$. An alternative to using Y to predict a particular obser-
27 We will discuss such models in the reading
on multiple regression.
vation
Yi is
using the regression relationship to make that prediction. In that case, our predicted value would be $\hat{Y}_i
= \hat{b}_0 + \hat{b}_1 X_i$. If the regression relationship works well, the error in predicting Yi using $\hat{Y}_i$ should be much smaller than the error in predicting Yi using $\bar{Y}$. If we
call $\sum_{i=1}^n (Y_i -\bar{Y})^2 $ the total
variation of Y and $\sum_{i=1}^n (Y_i -\hat{Y}_i)^2
$ the unexplained variation
from the regression, then we can measure the explained variation
from the regression using the following equation:
Total variation
= Unexplained variation
+ Explained variation (7)
The
coefficient of determination is the fraction of the total variation that is
explained by the regression. This gives us the relationship
$$\begin{equation*}
\begin{alignedat}{2}
% R & L
R^2 & = \frac{\text{Explained variation}}{\text{Total variation}} = \frac{\text{Total variation} - \text{Unexplained variation}}{\text{ Total variation }} \\
& = 1- \frac{\text{Unexplained variation}}{\text{ Total variation }}
\end{alignedat}
\end{equation*} $$
(8)
Note that total
variation equals explained variation plus unexplained variation, as shown in Equation 7. Most regression
programs report the coefficient of determination as R2.28
|
EXAMPLE 14 |
||||||||||||||||||||||||||||||||||||||||||||||||||
Inflation Rate and Growth in the Money SupplyUsing the data in
Table 7, we can see that the unexplained variation from the regression, which
is the sum of the squared residuals, equals 0.000316. Table 8 shows the
computation of total variation in the dependent variable, the long- term rate
of inflation.
The average
inflation rate for this period is 3.20 percent. The next-to-last column shows
the amount each country’s long-term inflation rate deviates from that
average; the last column shows the square of that deviation. The sum of those
squared deviations is the total variation in Y for the sample (0.001809),
shown in Table 8. Compute the
coefficient of determination for the regression. Solution: The coefficient of
determination for the regression is $$ \frac{\text{Total variation} - \text{Unexplained variation}}{\text{
Total variation }} = \frac{0.001809 - 0.000316}{0.001809}=0.8250
$$ Note that this
method gives the same result that we obtained earlier. We will use this
method again in the reading on multiple regression;
when we have more than one independent variable, this method is the only way
to compute the coefficient of determination. |
||||||||||||||||||||||||||||||||||||||||||||||||||
28 As we illustrate in the tables
of regression output
later in this reading, regression programs also report multiple R, which is the correlation between the actual
values and the forecast values
of Y. The coefficient of determination is the square of multiple R.
Hypothesis Testing
In this section, we address testing
hypotheses concerning the population values
of the intercept or slope coefficient of a regression model. This topic is critical
in practice. For example, we may want to check a stock’s
valuation using the capital asset pricing model;
we hypothesize that the stock has a market-average beta or level of systematic risk. Or we may want to test the hypothesis that economists’ forecasts
of the inflation rate are unbiased (not overestimates or underestimates, on average). In each case, does the evidence
support the hypothesis? Questions such as these can be addressed
with hypothesis tests within a regression model.
Such tests are often t-tests of the value of the intercept or slope coefficient(s). To understand the concepts involved
in this test, it is useful to first review a simple, equivalent approach
based on confidence intervals.
We can perform a hypothesis test using the confidence interval
approach if we know three things: 1) the estimated parameter value, $\hat{b}_0$ or
$\hat{b}_1$, 2) the hypothesized
value of the parameter, b0
or b1, and 3) a confidence interval around the estimated parameter. A confidence interval
is an interval of values
that we believe
includes the true parameter value, b1,
with a given degree of confidence. To compute a confidence interval, we must select
the significance level for the test and know the standard error of
the estimated coefficient.
Suppose we regress a stock’s returns on a stock market index’s returns and find that the slope coefficient $(\hat{b}_1 )$, is 1.5 with a standard error $(s_{\hat{b}_1})$ of 0.200. Assume we used 62 monthly observations in our regression analysis.
The hypothesized value of the parameter (b1) is 1.0, the market average slope coefficient. The estimated and the population slope coefficients are often called
beta, because the population coefficient is often represented by the Greek symbol beta (β) rather than the b1 we use in this reading. Our null hypothesis is that b1 = 1.0 and$\hat{b}_1$is the estimate for b1. We will use a 95 percent confidence interval for
our test, or we could say that the test has a significance level of 0.05.
Our confidence interval will span the range $\hat{b }_1 - t_c s_{\hat{b}_1}$ to $\hat{b }_1 + t_c s_{\hat{b}_1}$ or $\hat{b }_1 \pm t_c s_{\hat{b}_1}$ (9)
where
tc is the critical
t
value.29 The critical
value for the test depends
on the number of degrees of freedom for the t-distribution under the null hypothesis. The number of degrees of freedom equals the number
of observations minus the number of parameters
estimated. In a regression with one independent variable, there are two estimated parameters, the intercept term
and the coefficient on the independent variable. For 62 observations and two parameters estimated in this example, we have 60 degrees of freedom (62 − 2). For 60 degrees of freedom, the table of critical values
in
the back of the book shows that the critical
t-value at the 0.05 significance level is 2.00. Substituting the values
from our example into Equation 9 gives us the
interval
$$\begin{equation*}
\begin{alignedat}{2}
% R & L
\hat{b }_1 \pm t_c s_{\hat{b}_1} & = 1.5 \pm 2.00(0.200) \\
& = 1.5 \pm 0.400 \\
& = 1.10 \text{ to } 1.90
\end{alignedat}
\end{equation*} $$
A
95% confidence interval is the interval, based on the sample value,
that we would expect to include the population value with a 95% degree
of confidence. Because
we are testing the null hypothesis that b1 = 1.0 and because
our confidence interval
does not include 1.0, we
can reject the null hypothesis.
In
practice, the most common way to test a hypothesis using a regression model is with
a t-test of significance.
To test the hypothesis, we can compute the statistic $$ t= \frac{\hat{b}_1 – b_1}{ s_{\hat{b}_1} }$$
This test statistic has a t-distribution with n − 2 degrees of freedom because two parameters were estimated in the
regression. We compare the absolute value of the t-statistic to tc. If the absolute value
of t is greater
than tc, then we can reject the null hypothesis. Substituting the values from
the above example into this relationship gives the t-statistic associated with the test that the stock’s beta equals 1.0 (b1 = 1.0).
$$\begin{equation*}
\begin{alignedat}{2}
% R & L
t & = \frac{\hat{b}_1 – b_1}{ s_{\hat{b}_1} }\\
& = \frac{(1.50-1.0)}{0.200} \\
& = 2.50
\end{alignedat}
\end{equation*} $$
Because t > tc, we reject
the null hypothesis that b1
= 1.0.
The t-statistic in the example
above is 2.50,
and at the 0.05 significance level, tc = 2.00; thus we reject the null hypothesis because
t > tc. This statement
is equivalent to saying that we are 95 percent
confident that the interval for the slope coefficient does
not contain the value 1.0. If we were performing this test at the 0.01 level, how- ever,
tc would be 2.66 and we
would not reject the hypothesis because t would not be
greater than tc at this significance level. A 99 percent confidence
interval for the slope coefficient
does contain the value 1.0.
The choice of significance level is always a matter of
judgment. When we use higher levels
of confidence, the tc increases. This choice leads to wider confidence intervals and to a decreased likelihood
of rejecting the null hypothesis. Analysts often choose the 0.05 level
of significance, which indicates a 5 percent
chance of rejecting the null hypothesis when, in fact, it is true (a Type I error). Of course, decreasing the level of significance from 0.05 to 0.01 decreases
the probability of Type I error, but it increases the probability of Type II error—failing to reject the null hypothesis when, in fact, it is false.
29 We use the t-distribution for this test because
we are using a sample estimate of the standard
error, sb, rather than its true (population) value.
Often,
financial analysts do not simply report whether or not their tests reject a particular hypothesis about a regression
parameter. Instead, they report the p-value or probability value for a particular
hypothesis. The p-value
is the smallest level of significance
at which the null hypothesis can be rejected. It allows the reader to interpret the results rather than be told
that a certain hypothesis has been rejected or
accepted. In most regression software packages, the p-values printed for regression coefficients apply to a test of null hypothesis that the true
parameter is equal to 0 against the
alternative that the parameter is not equal to 0, given the estimated coefficient and the standard
error for that coefficient. For example, if the p-value is 0.005,
we can reject the hypothesis that the true parameter is equal to 0 at
the 0.5 percent significance level (99.5 percent confidence).
The
standard error of the estimated coefficient is an important input for a hypothesis
test concerning the regression coefficient (and for a confidence interval for
the estimated coefficient). Stronger regression results lead to smaller standard
errors of an estimated parameter
and result in tighter confidence
intervals. If the standard error
$(s_{\hat{b}_1})$ in the above example were 0.100 instead of 0.200, the confidence interval range would be half as large
and the t-statistic twice as large.
With a standard error this small, we would reject the null hypothesis even at the 0.01 significance level because we would have t = (1.5 − 1)/0.1 = 5.00 and tc = 2.66.
With this background, we can turn to hypothesis tests using actual
regression results. The next three
examples illustrate hypothesis tests in a variety of typical investment contexts.
|
EXAMPLE 15 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Estimating Beta for Westport Innovations StockWestport
Innovations Inc. (Westport) is a Canadian company that provides low-emission
engine and fuel system technologies utilizing gaseous fuels. Its stock trades
on the Toronto Stock Exchange. You are an investor in Westport’s stock and want
an estimate of its beta. As in the text example, you hypothesize that Westport
has an average level of market risk and that its required return in excess of
the risk-free rate is the same as the market’s required excess return. One regression that
summarizes these statements is (R – RF) = α + β(RM – RF) + ε (11)
where RF is the periodic risk-free rate of
return (known at the beginning of the period), RM is the periodic return on the market, R is the periodic return to the stock of the company, and β measures
the sensitivity of the required excess return to the excess return to market.
Estimating this equation with linear regression provides an estimate of β, $\hat{\beta}$, which tells us the size of the required return
premium for the security, given expectations about market returns.30 Suppose we want to
test the null hypothesis, H0,
that β = 1 for Westport stock to see whether Westport
stock has the same required return premium as the market as a whole. We need
data on returns to Westport stock, a risk-free interest rate, and the returns
to the market index. For this example, we use data from January 2009 through
December 2013 (n = 60). The return
to Westport stock is R. The monthly
return to 1 month Canadian Treasury bills is RF. The return to the S&P/TSX Composite Index is RM.31
This index is the primary broad measure of the Canadian equity market. We are
estimating two parameters, so the number of degrees of freedom is n − 2 = 60 − 2 = 58. Table
9 shows the results from the regression (R
− RF) = α + β (RM − RF) + ε.
1
Test the null hypothesis, H0, that β for Westport
equals 1 (β = 1) against the alternative hypothesis that β does not
equal 1 (β ≠ 1) using the confidence interval approach. 2
Test the above hypothesis using a t-test. 3
How much of Westport stock’s excess return variation can be attributed
to company-specific risk? Solution to 1: The estimated $\hat{\beta}$ from the regression is 1.0788. The estimated
standard error for that coefficient in the regression, $s_{\hat{\beta}
}$ is 0.3880. The regression equation has 58 degrees of freedom (60 −
2), so the critical value for the test statistic is approximately tc = 2.00 at the 0.05 significance level.
Therefore, the 95 percent confidence interval for the data for any particular
hypothesized value of β is shown by the range $$ \hat{\beta} \pm t_c s_{\hat{\beta} } \\ 1.0788 \pm 2.00 (0.3880) \\ 0.3028
\text{to} 1.8548 $$ In this case, the
hypothesized parameter value is β = 1, and the value 1 falls inside this
confidence interval, so we cannot reject the hypothesis at the 0.05
significance level. This means that we cannot reject the hypothesis that
Westport stock has the same systematic risk as the market as a whole. Solution to 2: The t-statistic for the Westport beta hypothesized
parameter can be computed using Equation 10: $$ t=\frac{\hat{\beta}-\beta}{ s_{\hat{\beta}}} = \frac{1.0788-1.0}{0.3880}=0.2031$$ This t-statistic is less than the critical t-value of 2.00. Therefore, neither approach
allows us to reject the null hypothesis. Note that the t-statistic associated with $\hat{\beta}$
in the regression results in Table 9 is 2.7800. Given the significance level
we are using, we cannot reject the null hypothesis that β = 1, but
we can reject the hypothesis that β = 0.32 Solution to 3: The R2 in this regression is
only 0.1176. This result suggests that only about 12 percent of the total
variation in the excess return to Westport stock (the return to Westport
above the risk-free rate) can be explained by excess return to the market
portfolio. The remaining 88 percent of Westport stock’s excess return
variation is the nonsystematic component, which can be attributed to company-specific
risk. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
30 Beta (β) is
typically estimated using 60 months of historical data, but the data-sample
length sometimes varies. Although monthly data is typically used,
some financial analysts estimate β using daily data. For more information on methods of estimating
β, see Reilly and Brown (2012). The expected excess return for Westport stock above the risk-free
rate (R − RF) is β(RM − RF), given a
particular excess return to the market above the risk-free rate (RM − RF). This result holds because we regress (R − RF) against
(RM − RF). For example, if a stock’s
beta is 1.5, its expected
excess return is 1.5 times that of the market
portfolio.
In the next example,
we show a regression hypothesis test with a one-sided alternative.
|
EXAMPLE 16 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Explaining Company Value Based on Returns to Invested CapitalSome financial
analysts have argued that one good way to measure a company’s ability to
create wealth is to compare the company’s return on invested capital (ROIC)
to its weighted-average cost of capital (WACC). If a company has an ROIC
greater than its cost of capital, the company is creating wealth; if its ROIC
is less than its cost of capital, it is destroying wealth.33 Enterprise value
(EV) is a market-price-based measure of company value defined as the market
value of equity and debt minus the value of cash and investments. Invested
capital (IC) is an accounting measure of company value defined as the sum of
the book values of equity and debt. Higher ratios of EV to IC should reflect
greater success at wealth creation in general. Mauboussin
(1996) argued that the spread between ROIC and WACC helps explain the ratio of
EV to IC. Using data on companies in the food-processing industry, we can test
the relationship between EV/IC and (ROIC–WACC) using the regression model
given in Equation 12. EVi/ICi = b0
+ b1(ROICi – WACCi) + εi (12) where the subscript i is an index
to identify the company. Our null hypothesis is H0: b1
≤ 0, and we specify a significance level of
0.05. If we reject the null hypothesis, we have evidence of a statistically significant
relationship between EV/IC and (ROIC–WACC). Equation 12 is estimated using
data from nine food-processing companies.34
The results of this regression are displayed in Table 10 and Figure 11.
Figure 11 Fitted
Regression Line Explaining Enterprise Value/Invested Capital Using ROIC–WACC Spread
for the Food
Industry We reject the null
hypothesis based on the t-statistic of approximately 7.79 on estimated slope
coefficient. There is a strong positive relationship between the return
spread (ROIC–WACC) and the ratio of EV to IC in our sample of companies.
Figure 11 illustrates the strong positive relationship. The R2 of 0.8966
indicates that the return spread explains about 90 percent of the variation
in the ratio of EV to IC among the food-processing companies in the sample in
2001. The coefficient on the return spread of 30.0169 implies that the
predicted increase in EV/IC is 0.01(30.0169) = 0.3002 or about 30 percent for
a 1-percentage-point increase in the return spread, for our sample of
companies. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
32 The t-statistics for a
coefficient automatically reported by statistical software programs assume that the null hypothesis states that the
coefficient is equal to 0. If you have a different null hypothesis, as we do in this example (β = 1), then you
must either construct the correct test statistic yourself or instruct the program to compute it.
33
See, for example,
Stewart (1991) and Mauboussin (1996).
34 Our
data come from Nelson, Moskow, Lee, and Valentine (2003) and relate to 2001. Many
sell-side analysts use this type of regression.
It is one of the most frequently used cross-sectional regressions in published analyst reports.
In
the final example of this section, the null hypothesis for a t-test of the slope coefficient
is that the value of slope equals 1 in contrast to the null hypothesis that it equals 0 as in prior examples.
|
EXAMPLE 17 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Testing whether Inflation Forecasts Are UnbiasedExample 12
introduced the concept of testing for bias in forecasts. That example showed
that if a forecast is unbiased, its expected error is 0. We can examine whether
a time-series of forecasts for a particular economic variable is unbiased by comparing
the forecast at each date with the actual value of the economic variable
announced after the forecast. If the forecasts are unbiased, then, by definition,
the average realized forecast error should be close to 0. In that case, the value
of b0 (the intercept) should
be 0 and the value of b1
(the slope) should be 1, as discussed in Example 12. Refer once again
to Figure 9, which shows the mean forecast made by professional economic
forecasters in the first quarter of a year for the percentage change in euro
area HICP during that year and the actual percentage change from 1999 through
2013 (n = 14). To test whether the
forecasts are unbiased, we must estimate the regression shown in Example 12. We
report the results of this regression in Table 11. The equation to be estimated
is Actual percentage change
in HICPt = b0
+ b1(Predicted changet)
+ εt This regression
estimates two parameters (the intercept and the slope); therefore, the
regression has n − 2 = 14
− 2 = 12 degrees of freedom.
We can now test
two null hypotheses about the parameters in this regression. Our first null hypothesis
is that the intercept in this regression is 0 (H0: b0
= 0). The alternative hypothesis is that the intercept does not equal 0 (Ha: b0 ≠ 0). Our second null hypothesis is that the
slope coefficient in this regression is 1 (H0: b1
= 1). The alternative hypothesis is that the slope coefficient does not equal
1 (Ha: b1 ≠ 1). To test the hypotheses
about b0 and b1, we must first decide on
a critical value based on a particular significance level and then construct the
confidence intervals for each parameter. If we choose the 0.05 significance
level, with 12 degrees of freedom, the critical value, tc, is approximately 2.18. The
estimated value of the parameter $\hat{\beta}_0$
is −0.7006, and the estimated value of the standard
0 Therefore, under the
null hypothesis that b0 =
B0, a 95 percent confidence
interval for b0 is $$\hat{\beta}_0
\pm t_c s_{\hat{\beta}_0} \\ -0.7006 \pm 2.18(0.3723)
\\ -1.5122 \text{to} 0.1110 $$ In this case, B0 is 0. The value of 0
falls within this confidence interval, so we cannot reject the first null hypothesis
that b0 = 0. We will explain
how to interpret this result shortly. Our second null
hypothesis is based on the same sample as our first null hypothesis.
Therefore, the critical value for testing that hypothesis is the same as the critical
value for testing the first hypothesis (tc = 2.18). The estimated value of the parameter
$\hat{\beta}_1$
is 1.5538, and the estimated value of the standard
1 In this case, our
hypothesized value of b1
is 1. The value 1 falls outside this confidence interval, so we can reject
the null hypothesis that b1
= 1 at the 0.05 significance level. Because we did reject one of the two null
hypotheses (b0 = 0, b1 = 1) about the
parameters in this model, we can reject the hypothesis that the forecasts of HICP
change were unbiased.35 As an analyst, you often will
need forecasts of economic growth to help you make recommendations about asset
allocation, expected returns, and other investment decisions. The hypothesis
tests just conducted suggest that you can reject the hypothesis that the HICP
predictions in the Survey of Professional Forecasters are unbiased. If you need
an unbiased forecast of future percentage change in HICP for your
asset-allocation decision, you might not want to use these forecasts. In view of the
above concern, we further explored the inflation forecasts. Figure 9 suggests
that the bottommost point in the plot of actual versus realized inflations is
an outlier. This point corresponds to the year 2009 when macro- economic
volatility was exceptionally high due to the financial crisis. A study of
forecasts in the European Central Bank Survey of Professional Forecasters by
Genre, Kenny, Meyler, and Timmerman (2010) finds
that the performance of inflation forecasts is lowered when the financial crisis
period is included. We re-estimated the regression equation after excluding 2009.
The new equation is Actual change = –0.2513 + 1.3209(Predicted change). Under
the null hypothesis for the intercept that b0 = 0, a 95 percent confidence interval for b0 is –1.2116 to 0.7090. The
value of 0 falls within this confidence interval, so we cannot reject the first
null hypothesis that b0 =
0. Under the null hypothesis for the slope that b1 = 1, a 95 percent confidence interval for b1 is 0.7984 to 1.8434. The
value of 1 falls within this confidence interval, so we cannot reject the
second null hypothesis that b1
= 1. These hypothesis tests suggest that you cannot reject the hypothesis that
the HICP predictions in the Survey of Professional Forecasters are unbiased. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
35 Jointly testing the hypothesis
b0 = 0 and b1 = 1 would require us to take into account the covariance
of $\hat{b}_0$ and $\hat{b}_1$. For information on testing joint hypotheses of this type, see Greene (2011).
Analysis of Variance in a Regression with One Independent Variable
Analysis
of variance (ANOVA) is a statistical procedure for dividing
the total variability of a variable into components that can be attributed to
different sources.36 In regression
analysis, we use ANOVA to determine the usefulness of the independent variable or variables in explaining
variation in the dependent variable. An important statistical test conducted
in analysis of variance is the F-test.
The F-statistic tests whether
all the slope coefficients in a linear regression are equal to 0. In a
regression with one independent variable,
this is a test of the null hypothesis H0: b1 = 0 against the alternative hypothesis Ha: b1 ≠ 0.
To
correctly determine the test statistic for the null hypothesis that the slope coefficient equals 0, we need to
know the following:
§ the
total number of observations (n);
§
the total number of parameters to be estimated (in a
one-independent-variable regression, this number is two: the intercept and the slope coefficient);
36 In this
reading, we focus on regression applications of ANOVA, the most common context
in which financial
analysts will encounter this tool. In this context, ANOVA is used to test
whether all the regression slope coefficients are equal to 0. Analysts also
use ANOVA to test a hypothesis that the means of two or more populations are equal. See Daniel and Terrell (1995)
for details.
§ the sum of squared errors or residuals, $\sum_{i=1}^n
\left(Y_i - \hat{Y}_i \right)^2$ , abbreviated SSE.
This value is also known as the residual sum of squares; and
§ the regression sum of squares, $\sum_{i=1}^n \left(\hat{Y}_i
- \bar{Y}_i \right)^2$ , abbreviated RSS.
This value is the amount of total variation in Y that is explained in the
regression equation. Total variation (TSS) is the sum of SSE and RSS.
The
F-test for determining whether the slope coefficient
equals 0 is based on an F-statistic, constructed using these four values. The F-statistic measures
how well the regression equation
explains the variation
in the dependent variable. The F-statistic is the
ratio of the average regression sum of squares to the average sum of the
squared errors. The average regression sum of squares
is computed by dividing the regression sum of squares by the number of slope parameters
estimated (in this case, one). The average sum of squared
errors is computed
by dividing the sum of squared errors
by the number of observations, n, minus the total number of parameters estimated (in this case, two: the intercept and the slope).
These two divisors are the degrees of freedom
for an F-test. If there are n observations, the F-test for the null hypothesis that the slope coefficient is equal to 0 is here denoted F(# slope parameters),(n – # parameters) = F1,n−2,
and the test has 1 and n − 2
degrees of freedom.
Suppose,
for example, that the independent variable in a regression model explains none
of the variation in the dependent variable. Then the predicted value for the
regression model, Yi , is the average value
of the dependent variable Y . In this case, the regression sum of $\sum_{i=1}^n \left(\hat{Y}_i - \bar{Y}_i \right)^2$ squares is 0. Therefore, the F-statistic is
0. If the independent variable explains little of the variation in the
dependent variable, the value of the F-statistic will be very small.
The formula for the F-statistic
in a regression with one independent variable is $$ F= \frac{\text{RSS}/1}{\text{SSE}/(n-2)}=\frac{\text{Mean regression sum of squares}}{\text{Mean
squared error}}$$ (13)
If
the regression model does a good job of explaining variation in the dependent variable,
then this ratio should be high. The explained regression sum of squares per estimated
parameter will be high relative to the unexplained variation for each degree of
freedom. Critical values for this F-statistic are given in Appendix D at the
end of this volume.
Even though the F-statistic is commonly computed by regression software
pack- ages, analysts typically do
not use ANOVA and F-tests
in regressions with just one independent
variable. Why not? In such regressions, the F-statistic is the square of the t-statistic
for the slope coefficient. Therefore, the F-test duplicates the t-test for the
significance of the slope coefficient. This relation is not true for
regressions with two or more slope
coefficients. Nevertheless, the one-slope coefficient case gives a foundation for understanding the multiple-slope coefficient cases.
Often, mutual fund performance is evaluated based on
whether the fund has positive
alpha—significantly positive excess risk-adjusted returns.37 One commonly used
method of risk adjustment is based on the capital asset pricing model. Consider the regression
(Ri – RF) = αi + βi(RM – RF) + εi (14)
where
RF is
the periodic risk-free rate of return (known at the beginning of the period), RM is
the periodic return on the market, Ri is
the periodic return to Mutual Fund i, and βi is
the fund’s beta. A fund has zero risk-adjusted excess return if αi = 0.
If αi = 0, then (Ri − RF) = βi(RM − RF) + εi and taking expectations, E(Ri) = RF + βi(RM − RF), implying that βi completely explains the fund’s mean excess returns.
If, for example,
αi > 0, the fund is earning higher returns than expected given its beta.
In summary, to test whether
a fund has a positive
alpha, we must test the null hypothesis that the fund has no
risk-adjusted excess returns (H0: α = 0) against the alternative hypothesis of nonzero risk-adjusted returns (Ha: α ≠ 0).
|
EXAMPLE 18 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Performance Evaluation: The Dreyfus Appreciation FundTable 12 presents
results evaluating the excess return to the Dreyfus Appreciation Fund from
January 2009 through December 2013. Note that the estimated beta in this regression,
$\hat{\beta}_i$ is 0.8660. The Dreyfus Appreciation
Fund was estimated to be almost 0.9 times as risky as the market as a whole.
1
Test whether the fund had a significant excess return
beyond the return associated with the market risk of the fund. 2
Based on the t-test, discuss whether the beta of
the fund is likely to be zero. 3
Use Equation 13 to compute the F-statistic. Based
on the F-test, deter- mine whether the beta of the fund is likely to be zero. Solution to 1: The estimated alpha
$(\hat{\alpha})$ in this regression is negative (−0.0012).
The absolute value of the coefficient is less than the size of the standard
error for that coefficient (0.0015), so the t-statistic for the coefficient is only −0.8050. Therefore,
we cannot reject the null hypothesis (α = 0) that the fund did not have
a significant excess return beyond the return associated with the market risk
of the fund. This result means that the returns to the fund were explained by
the market risk of the fund and there was no additional statistical significance
to the excess returns to the fund during this period.38 Solution to 2: Because the t-statistic for the slope coefficient in
this regression is 27.3147, the p-value
for that coefficient is less than 0.0001 and is approximately zero. Therefore,
the probability that the true value of this coefficient is actually 0 is microscopic. Solution to 3: The ANOVA portion of
Table 12 provides the data we need to compute the F-statistic. In this case: ■
the total number of observations (n) is 60; ■
the total number of parameters to be estimated is
2 (intercept and slope); ■
the sum of squared errors or residuals, SSE, is
0.0072; and ■
the regression sum of squares, RSS,
is 0.0925. Therefore, the F-statistic to test whether the slope coefficient
is equal to 0 is $ \frac{0.0925/1}{0.0072/(60-2)}=
745.14$ (The slight difference from the F-statistic in Table 12 is due to
rounding.) The ANOVA output would show that the p-value for this F-statistic
is less than 0.0001 and is exactly the same as the p-value for the t-statistic
for the slope coefficient. Therefore, the F-test
tells us nothing more than we already knew from the t-test. Note also that the F-statistic
(746.09) is the square of the t-statistic
(27.3147). |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Prediction Intervals
Financial analysts
often want to use regression results to make predictions about
a dependent variable. For example, we might ask, “How fast will the
sales of XYZ Corporation grow this
year if real GDP grows by 4 percent?” But we are not merely interested in making these forecasts; we also want to know how certain
we should be about
the forecasts’ results. For example, if we predicted that sales for XYZ
Corporation would grow by 6 percent
this year, our prediction would mean more if we were 95 percent confident that sales growth would fall in the interval from 5 percent
to 7 percent, rather than
only 25 percent confident that this outcome would occur. Therefore, we need to understand how to compute
confidence intervals around
regression forecasts.
38 This example introduces a well-known investment use of regression involving the capital
asset pricing model. Researchers, however, recognize
qualifications to the interpretation of alpha from a linear regression. The
systematic risk of a managed portfolio is controlled by the portfolio manager.
If, as a consequence, portfolio beta
is correlated with the return on the market (as could result from market
timing), inferences on alpha
based on least-squares beta, as here, can be mistaken. This advanced subject is
discussed in Dybvig
and
Ross (1985a) and (1985b).
We
must take into account two sources of uncertainty when using the regression model Yi = b0 + b1Xi + εi, i = 1, …, n and the estimated parameters, $\hat{b}_0$ and $\hat{b}_1$, to make a prediction. First, the error term itself contains uncertainty. The standard deviation of the error term, σε,
can be estimated from the standard error of estimate for the regression equation. A second source of
uncertainty in making predictions about Y, however, comes from uncertainty in the estimated parameters $\hat{b}_0$ and $\hat{b}_1$.
If we knew the true
values of the regression parameters, b0 and b1, then the variance
of our prediction of Y, given any particular predicted (or assumed) value of
X, would simply
be s2, the squared
standard error of estimate. The variance would be s2 because the prediction, Y‸ , would come from the equation $\hat{Y}=b_0 +b_1 X$ and $\left(Y-\hat{Y}\right)=\epsilon$.
Because we
must estimate the regression
parameters $\hat{b}_0$
and $\hat{b}_1$ however, our prediction of Y, $\hat{Y}$ given any particular predicted value of X, is actually $\hat{Y}=\hat{b}_0
+\hat{b}_1 X$. The estimated variance
of the prediction error, s2 of Y, given X, is $$ s^2_f=s^2 \left[1+\frac{1}{n}+\frac{(X-\bar{X})^2}{(n-1)s_x ^2}\right]$$ (15)
This estimated
variance depends on:
§ the
squared standard error of estimate,
s2;
§ the
number of observations, n;
§ the
value of the independent variable,
X, used to predict the dependent variable;
§ the
estimated mean, X ; and
§
variance, s2 of the independent variable.39
Once
we have this estimate of the variance of the prediction error, determining a prediction interval around the prediction
is very similar to estimating a confidence interval
around an estimated parameter, as shown earlier in this reading. We need to take the following
four steps to determine the prediction interval
for the prediction:
1
Make
the prediction.
2
Compute
the variance of the prediction error using Equation 15.
3
Choose
a significance level, α, for the forecast. For example, the 0.05 level, given
the degrees of freedom in the regression, determines the critical value for the
forecast interval, tc.
4
Compute
the (1 − α) percent prediction interval for the prediction, namely $\hat{Y}\pm
t_c s_f $.
39 For a derivation of this equation, see Pindyck
and Rubinfeld (1998).
3.1
Limitations of Regression Analysis
Although this reading has shown many of the uses of regression models for financial analysis, regression models do have
limitations. First, regression relations can change over time, just as correlations
can. This fact is known as the issue of parameter instability, and its existence should not be surprising as the
economic, tax, regulatory, political,
and institutional contexts in which financial markets operate change. Whether considering
cross-sectional or time-series regression, the analyst will probably face this issue. As one example,
cross-sectional regression relationships between stock characteristics may differ between growth-led and value-led
markets. As a second example, the time-series regression estimating the beta often yields
significantly different
estimated betas depending on the time period selected. In both cross-sectional and time-series contexts, the most common
problem is sampling
from more than one population, with the challenge of
identifying when doing so is an issue.
A second limitation to the use of regression results specific to investment contexts
is that public knowledge of regression relationships may negate their future usefulness. Suppose, for example, an analyst discovers
that stocks with a certain characteristic have had historically very high returns. If other analysts discover
and act upon this relationship, then the prices of stocks with that characteristic will be bid up. The knowledge of the relationship may result in the relation
no longer holding in the future. Finally,
if the regression assumptions listed in Section 3.2 are violated, hypothesis tests and predictions based on linear regression will not be valid. Although
there are tests for violations
of regression assumptions, often uncertainty exists as to whether an assumption has been violated.
This limitation will be discussed
in detail in the reading on multiple regression.
SUMMARY
§
A
scatter plot shows graphically the relationship between two variables. If the
points on the scatter plot cluster together in a straight line, the two
variables have a strong linear relation.
§
The
sample correlation coefficient for two variables X and Y is $r=\frac{Cov(X,Y)}{s_x s_y}$.
§
If
two variables have a very strong linear relation, then the absolute value of
their correlation will be close to 1. If two variables have a weak linear relation,
then the absolute value of their correlation will be close to 0.
§
The
squared value of the correlation coefficient for two variables quantifies the
percentage of the variance of one variable that is explained by the other. If
the correlation coefficient is positive, the two variables are directly
related; if the correlation coefficient is negative, the two variables are
inversely related.
§
If
we have n observations for two variables, we can test whether the population
correlation between the two variables is equal to 0 by using a t-test. This
test statistic has a t-distribution with n − 2 degrees of freedom if the
null hypothesis of 0 correlation is true.
§
Even
one outlier can greatly affect the correlation between two variables. Analysts should
examine a scatter plot for the variables to determine whether outliers might affect
a particular correlation.
§
Correlations
can be spurious in the sense of misleadingly pointing toward associations
between variables.
§
The
dependent variable in a linear regression is the variable that the regression model
tries to explain. The independent variables are the variables that a regres- sion
model uses to explain the dependent variable.
§
If there
is one independent variable in a linear regression and there are n observations
on the dependent and independent variables, the regression model is Yi
= b0 + b1Xi + εi, i = 1, …, n,
where Yi is the dependent variable, Xi is
the independent variable, and εi is the error term. In this model, the coefficient
b0 is the intercept. The intercept is the predicted value of the
dependent variable when the independent variable has a value of zero. In this model,
the coefficient b1 is the slope of the regression line. If the
value of the independent variable increases by one unit, then the model predicts
that the value of the dependent variable will increase by b1 units.
§
The
assumptions of the classic normal linear regression model are the following:
•
A linear relation
exists between the dependent variable
and the independent variable.
•
The independent variable
is not random.
•
The expected value of the error term is 0.
•
The variance of the error term is the same for all observations (homoskedasticity).
•
The error term is uncorrelated across observations.
•
The error term is normally
distributed.
§
The
estimated parameters in a linear regression model minimize the sum of the squared
regression residuals.
§
The
standard error of estimate measures how well the regression model fits the data.
If the SEE is small, the model fits well.
§
The
coefficient of determination measures the fraction of the total variation in the
dependent variable that is explained by the independent variable. In a linear regression
with one independent variable, the simplest way to compute the coefficient of determination
is to square the correlation of the dependent and independent variables.
§
To
calculate a confidence interval for an estimated regression coefficient, we must
know the standard error of the estimated coefficient and the critical value for
the t-distribution at the chosen level of significance, tc.
§ To test whether the population value of a regression
coefficient, b1, is equal to a particular hypothesized value, B1,
we must know the estimated coefficient, $s_{\hat{b}_1}$,
the standard error of the estimated coefficient, , and the critical value for
the t-distribution at the chosen level of significance, tc.
The test statistic for this hypothesis is $\frac{(\hat{b}_1
–B_1)}{s_{\hat{b}_1}}$. If the absolute value of this statistic is greater than tc, then we
reject the null hypothesis that b1 = B1.
§
In the regression
model Yi = b0 + b1Xi + εi, if we know the estimated
parameters, $\hat{b}_0$ and $\hat{b}_1$, for any value of the independent variable, X, then the predicted value of the dependent variable Y
is $\hat{Y}=\hat{b}_0 +\hat{b}_1 X$.
§
The prediction interval
for a regression equation for a particular predicted value of the dependent variable is $\hat{Y}\pm t_c s_f $. where sf is the square root of the estimated variance of
the prediction error and tc is the critical level for the t-statistic at the
chosen significance level. This computation specifies a (1 − α)
percent confidence interval.
For example, if α = 0.05, then this computation yields a 95 percent confidence interval.











Comments
Post a Comment