Chapter 5
\chapter{Hypothesis Testing\label{biv.hyp.chap}}
As discussed in Chapter \ref{intro.chap}, one of the major functions of econometric analysis is to provide a mechanism for testing hypotheses. In this chapter, we will examine how hypothesis testing may be performed using the bivariate regression model. By the end of this chapter, you should be able to perform hypothesis tests concerning either the sign or the magnitude of the intercept and slope parameters. The construction of confidence intervals for these parameter estimates is also examined in this chapter.
It will be useful if we examine the role of the normal distribution in regression analysis, however, before we discuss the process of hypothesis testing.
\section{An additional assumption: Normality}
In Chapter \ref{biv.reg.chap}, the properties of OLS estimators were examined under the assumptions of the classical regression model. No assumptions were made concerning the specific form of the probability density function generating the error terms (other than the very general conditions stated in the assumptions). In particular, we should note that the derivation of the OLS estimators does not require that the error process follow a particular probability distribution. Under the conditions of the classical regression model discussed in Chapter \ref{biv.reg.chap}, the OLS estimators are BLUE, regardless of the distribution generating the error terms.
In order to perform hypothesis tests, however, it is necessary to make some assumptions about the probability density function for the error terms. In practice, econometricians frequently assume that error terms are normally distributed with a mean of 0 and a variance of $\sigma ^{2}$.
As noted in Chapter \ref{stat.chap}, one of the important properties of the normal distribution is that any linear combination of normally distributed random variables is also normally distributed. If the random error term, $% u_{i}$, is normally distributed, $Y_{i}$ is also normally distributed (since $Y_{i}=\beta _{o}+\beta _{1}X_{i}+u_{i}$). In Chapter \ref{biv.reg.chap}, it was observed that the OLS estimators for $\beta _{o}$ and $\beta _{1}$ are linear functions of the random variable $Y_{i}$. Thus, the estimators $\hat{% \beta}_{o}$ and $\hat{\beta}_{1}$ are also normally distributed. Since $\hat{% \beta}_{o}$ and $\hat{\beta}_{1}$ are both unbiased estimators, the mean value of each estimator equals the corresponding population parameter. As derived in Chapter \ref{biv.reg.chap}, the variances of $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ are given by:
\begin{equation*}
var(\hat{\beta}_{o})=\sigma ^{2}\left( \frac{\overline{X}^{2}}{\sum x_{i}^{2}% }+\frac{1}{N}\right)
\end{equation*}
and
\begin{equation*}
var(\hat{\beta}_{1})=\frac{\sigma ^{2}}{\sum x_{i}^{2}} \end{equation*}
Thus, we can state that:
\begin{equation}
\hat{\beta}_{o}\sim N\left( \beta _{o},\sigma ^{2}\left( \frac{\overline{X}% ^{2}}{\sum x_{i}^{2}}+\frac{1}{N}\right) \right) \label{beta0dist.bc} \end{equation}
and
\begin{equation}
\hat{\beta}_{1}\sim N\left( \beta _{1},\frac{\sigma ^{2}}{\sum x_{i}^{2}}% \right) \label{beta1dist.bc}
\end{equation}
\subsection{Nonnormal errors and large samples}
A variety of \textbf{central limit theorems} indicate that, under the assumptions of the classical regression model, the probability distribution functions for the estimators $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ will converge to a normal distribution as the size of the sample rises even when the error terms are not normally distributed.\footnote{% As noted earlier, these central limit theorems suggest that, under a wide variety of circumstances, the distribution of the sum of independent random variables that have a constant mean and variance will converge to a normal distribution as the number of elements in the sum tends toward infinity.} Therefore, even if the error terms do not follow a normal distribution, the distributions of $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$ will still converge to the normal distributions described in equations \ref{beta0dist.bc} and % \ref{beta1dist.bc} as the sample size increases. For this reason, in large samples, it is reasonably safe to assume that the intercept and slope estimators follow normal distributions (as long as all of the other assumptions of the classical regression model are satisfied).
\section{The $t$-ratio\label{uvtr}}
Since we assume that the estimated intercept and slope parameters are distributed normally, the variable $Z$ defined as:
\begin{equation}
Z=\frac{\hat{\beta}_{j}-\beta _{j}}{\sigma _{\hat{\beta}_{j}}} \label{ztrans}
\end{equation}
follows a standard normal distribution with a mean equal to zero and a variance equal to one.\footnote{%
This relationship is derived in Chapter \ref{stat.chap}.} Note that this standard normal transformation can be applied to either $\hat{\beta}_{o}$ or $\hat{\beta}_{1}$.
We can use the standard normal CDF for hypothesis testing, however, only if we know the variance of our estimator. In virtually all practical applications, this variance is unknown and must be estimated from observed data. If we replace the denominator in equation \ref{ztrans} with the estimated standard error for our estimator ($\hat{\sigma}_{\hat{\beta}_{1}}$% ), we can create a new variable:
\begin{equation}
t=\frac{\hat{\beta}_{1}-\beta _{1}}{\hat{\sigma}_{\hat{\beta}_{1}}} \label{t_ratio_bivhyp}
\end{equation}
This variable, called a\textbf{\ }%
%TCIMACRO{\TeXButton{t}{{\boldmath $t$}}}%
%BeginExpansion
{\boldmath $t$}%
%EndExpansion
\textbf{-ratio}, is distributed according to Student’s $t$-distribution with $N-2$ degrees of freedom. There are $N-2$ degrees of freedom in this case because, as noted in Chapter \ref{stat.chap}, the degrees of freedom for the $t$-ratio equals:
\begin{equation*}
\text{degrees of freedom. = \# of observations }-\text{ \# of estimated parameters}
\end{equation*}
The degrees of freedom for this variable equals $N-2$ because the construction of the $t$-ratio (equation \ref{t_ratio_bivhyp}) requires that two parameters be estimated (the slope and intercept terms, $\hat{\beta}_{o}$ and $\hat{\beta}_{1}$).
As noted in Chapter \ref{estimators.chap}, the $t$-distribution resembles the normal distribution, but has somewhat thicker tails. As the size of the sample increases, however, the $t$-distribution approaches the normal distribution.
As we will soon see, $t$-ratios are extensively used in econometric analysis for testing hypotheses concerning the magnitude and sign of estimated parameters.
\section{Hypothesis testing}
As a social science, economists attempt to use the scientific method. In Chapter \ref{intro.chap}, it was noted that this method involves three steps: \begin{enumerate}
\item observing a phenomenon,
\item formulating a hypothesis, and
\item testing the hypothesis.
\end{enumerate}
Hypothesis testing is used to examine whether the predictions of economic theory are consistent with actual economic processes. Economic models often make some prediction concerning either the sign or magnitude of the regression coefficients $\beta _o$ and $\beta _1$. For example, Keynes predicted that the marginal propensity to consume (the slope coefficient in the consumption function) will be between zero and one. The theory of demand predicts that the slope of a demand curve is negative. In some cases, however, economic theory predicts that a variable may have an effect on another variable, but the direction of the effect is ambiguous. For example, consider the effect of a change in income on the quantity of fast-food hamburgers demanded (holding other factors constant). If fast-food hamburgers are a normal good, quantity demanded will rise as income rises.
Quantity demanded will fall as income rises, however, if this good is an inferior good.
\subsection{One-tailed and two-tailed hypothesis tests} To conduct a hypothesis test, two mutually exclusive hypotheses are formulated. These hypotheses are referred to as the \textbf{null hypothesis} (H$_o$) and the \textbf{alternative hypothesis} (H$_1$). The hypothesis test involves using statistical evidence to select between these alternatives. In the bivariate regression model, one type of hypothesis test involves a choice between the hypotheses:
\begin{equation*}
\text{H}_o\text{: }\beta _1=c
\end{equation*}
and
\begin{equation*}
\text{H}_1\text{: }\beta _1\neq c
\end{equation*}
where $c$ is a constant. For simplicity of notation, we will specify the hypothesis tests in terms of the slope coefficient, $\beta _1$. Tests involving the intercept term are formed in an equivalent manner.
In this case, the null hypothesis states that the slope parameter equals $c$% . This type of test is called a \textbf{two-tailed hypothesis test} because the null hypothesis can be rejected if the estimated value of the coefficient, $\hat{\beta}_1$, is either substantially greater than or substantially less than the hypothesized value (= $c$).
The most common application of a two-tailed test, however, involves a choice between the hypotheses:
\begin{equation*}
\text{H}_o\text{: }\beta _1=0
\end{equation*}
and
\begin{equation*}
\text{H}_1\text{: }\beta _1\neq 0
\end{equation*}
If the null hypothesis is correct, the level of the independent variable ($% X_i$) has no effect on the level of the dependent variable ($Y_i$). Under the alternative hypothesis, the level of $X_i$ has an effect (either positive or negative) on the level of $Y_i$. This type of test is used when: \begin{itemize}
\item economic theory does not generate a clear prediction concerning the sign of the coefficient being tested, or
\item there are conflicting theories concerning the sign of a coefficient.
\end{itemize}
There are many situations in which economic theory does not predict the sign of a coefficient. As noted above, economic theory does not predict whether an increase in consumer income will result in an increase or decrease in the demand for fast-food hamburgers. This depends on the specific nature of individual preferences. In a similar manner, economic theory does not predict whether two goods are substitutes or complements.
In many cases, there are alternative theories that generate different predictions. Consider, for example, the effect of advertising on the price of products. One argument suggests that an increase in advertising raises costs, resulting in an increase in the price of the product. An alternative argument, however, suggests that advertising increases price competition among firms, resulting in lower prices.
As noted above, however, economic theory often predicts the sign of a coefficient. Suppose, for example, that an economist estimates the parameters of a demand equation given by:
\begin{equation*}
\text{QD}_{i}=\beta _{o}+\beta _{1}\text{P}_{i}+u_{i} \end{equation*}
In this case, economic theory predicts that the estimated slope coefficient, $\beta _{1}$ is negative. The null and alternative hypotheses for this test are given by:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{1}\geq 0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}<0
\end{equation*}
If the estimated value of the slope coefficient is greater than 0, there would be no reason to reject the null hypothesis. Thus, the decision to reject the null hypothesis would only occur in cases in which the estimated parameter is negative. (The null hypothesis would be rejected, however, only if the estimated value is sufficiently large in magnitude to provide compelling evidence to reject the null hypothesis.) This test is an example of a \textbf{one-tailed test} since the decision to reject the null hypothesis occurs in only one tail of the distribution of possible outcomes for $\hat{\beta}_{1}$.
A more general form of one-tailed test is provided by a choice between: \begin{equation*}
\text{H}_o\text{: }\beta _1\geq c
\end{equation*}
and
\begin{equation*}
\text{H}_1\text{: }\beta _1<c
\end{equation*}
This type of test is used to determine whether a coefficient exceeds a particular threshold value. In the case of a consumption function, for example, we could use this type of hypothesis test to determine whether the statistical evidence supports the Keynesian argument that the marginal propensity to consume (the slope of the consumption function) is less than one.
Alternatively, if one wishes to examine whether a coefficient is less than a specified value, the appropriate hypotheses are:
\begin{equation*}
\text{H}_o\text{: }\beta _1\leq c
\end{equation*}
and
\begin{equation*}
\text{H}_1\text{: }\beta _1>c
\end{equation*}
Let’s examine the process of hypothesis testing.
\subsection{Type I and type II errors\label{t1&2.start}} Unfortunately, when we test a hypothesis using statistical procedures, we do not receive a clear cut result that tells us whether the hypothesis should be accepted or rejected. Suppose, for example, that there is a positive relationship between education and earnings in the population. Since there are other unobservable factors that affect the level of earnings, we may find no relationship between these variables (or even an inverse relationship) in a given sample. On the other hand, if we incorrectly believe that an inverse relationship exists between education and earnings, we may occasionally find such a relationship in particular samples. This discussion should suggest that there are two possible types of errors that we may make:
\begin{equation*}
\begin{array}{ll}
\text{Type I error:} & \text{the error that occurs when you inappropriately } \\
& \text{reject a correct null hypothesis.} \\
\text{Type II error:} & \text{the error that occurs when you fail to reject} \\
& \text{an incorrect null hypothesis (\textit{i.e.,} you do not} \\ & \text{accept a correct alternative hypothesis).}%
\end{array}%
\end{equation*}
An analogy might be useful here. Suppose that an individual is accused of a crime. One of the primary purposes of our judicial system is to determine whether the individual is guilty of the criminal activity. Under the principles of our system of jurisprudence, an individual is said to be “innocent until proven guilty.” The judicial system, in essence, serves as a procedure for testing the hypothesis of the individual’s guilt. Using the terminology presented above, the null hypothesis is: \begin{equation*}
\text{H}_{\text{o}}\text{: the individual is innocent} \end{equation*}
and the alternative hypothesis is:
\begin{equation*}
\text{H}_1\text{: the individual is guilty}
\end{equation*}
A person can be convicted of the crime only if the null hypothesis is rejected.
In our criminal justice system. a type I error occurs if an innocent person is convicted of a crime (\textit{i.e.,} the null hypothesis is incorrectly rejected). A type II error occurs if a guilty person is not convicted. It should be obvious that a trade off occurs between the probability of type I and type II errors. Under the strictest standard of evidence, it is necessary to provide “proof beyond a shadow of a doubt.” In this case, few innocent individuals will be convicted, but many guilty individuals will go free. Under many types of civil law procedures, it is only necessary to show that the “preponderance of evidence” indicates that an award should be made. This looser standard will increase the probability of a type I error, but will reduce the probability of a type II error. In criminal procedures, it is generally argued that the cost of a type I error is greater than the cost of a type II error, so relatively rigorous standards of evidence are used.
Hypothesis tests in econometric applications proceed in a similar manner.
Suppose, for example, that an economist wishes to test the Keynesian proposition that states that the marginal propensity to consume (MPC) is greater than zero. As noted in Chapter \ref{intro.chap}, a simple version of the Keynesian consumption function is given by:
\begin{equation*}
\text{C}_{t}=\beta _{o}+\beta _{1}\text{YD}_{t}+u_{t} \end{equation*}
Under this specification, the MPC is equal to the slope parameter $\beta _{1} $. To test whether the MPC is greater than zero, the following null and alternative hypotheses can be formulated:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{1}\leq 0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}>0
\end{equation*}
Notice that the alternative hypotheses captures the relationship that the researcher believes to be true. In a typical application, the null hypothesis is constructed so that it represents the opposite of what the model predicts.\footnote{%
McCloskey (1985) and McCloskey and Ziliak (1996) argue that it is more consistent with the scientific method to formulate the null hypothesis so that it captures the predictions of the theoretical model. Under this approach, a hypothesis test would cause you to either reject the model or fail to reject the model.} By constructing the test in this manner, the burden of proof is placed on the researcher who must provide a convincing case for rejecting the null hypothesis. Suppose, for example, that the evidence is believed to be strong enough to support the rejection of a null hypothesis that states that the MPC is less than or equal to zero. If this occurs, the econometrician can argue that the statistical evidence is consistent with the argument that states that the MPC is greater than zero.
As in the judicial example above, a type I error occurs when the null hypothesis is incorrectly rejected; a type II error occurs when a researcher fails to reject an incorrect null hypothesis. We call the probability of a type I error the \textbf{significance level} of the test. Since the burden of proof is placed on the researcher, a relatively low significance level is generally required. The most commonly used significance levels in econometrics (and statistical applications in general) are the one percent and five percent levels. A ten percent significance level is sometimes used, although somewhat less often. The significance level of a hypothesis test is generally denoted by the symbol $\alpha $. Thus, if you see the statement $% \alpha =0.05$, this means that the researcher has established a decision rule in which the probability of a type I error is 5\%. In the criminal justice system it is necessary to establish proof “beyond a reasonable doubt;” in econometric and statistical applications the standard of evidence is (typically) a 1\% or 5\% chance of error (significance level).
The choice of the significance level of the test should depend upon the relative cost of type I and type II errors. In hypothesis tests involving the introduction of new medicines or surgical procedures, a type I could involve the loss of lives (if an unsafe drug is approved for sale). In the case of market researchers estimating demand curves for a product, the cost of an inaccurate slope estimate may involve a loss in the firm’s profits. If a low significance level is chosen , the probability of a type I error is low, but the probability of a type II error is likely to be relatively high.
A lower significance level results in a higher probability of a type I error, but a reduction in the chance of a type II error. The boxed example on type I and type II errors in the drug approval process appearing on p.~% \pageref{med.type.errors} illustrates the tradeoffs that are involved in this decision.
An alternative (and equivalent) way of evaluating the standard of evidence imposed on a hypothesis test is provided by the \textbf{confidence level}, defined as:
\begin{equation*}
\text{confidence level = 1 – }\alpha
\end{equation*}
The confidence level is the probability of avoiding a type I error. If a 5\% significance level is used in repeated experiments, a type I error will be avoided 95\% of the time. It should be obvious that the significance level and the confidence level are both alternative ways of measuring the same thing.
An alternative method of evaluating a hypothesis test is the \textbf{power of the test,} defined as:
\begin{equation*}
\text{power of the test = 1 – probability of type II error} \end{equation*}
Thus, the power of the test is the probability of avoiding a type II error.
Using the judicial analogy above, the power of the test in a criminal trial is the probability of convicting a guilty person. In statistical terms, the power of the test is the probability that the null hypothesis will be rejected in favor of the alternative hypothesis when the alternative hypothesis is true.
For a given test procedure, the power of the test can be increased only if one is willing to accept a higher probability of a type I error. In the judicial system, more guilty individuals will be convicted only if society is willing to accept that more innocent individuals will be convicted. When a 99\% confidence level is used in repeated tests, type I errors will occur only 1\% of the time. Under this relatively stringent standard of evidence, however, the null hypothesis will be rejected less often, even when it is false. When a 90\% confidence level is used in repeated tests, type I errors will occur 10\% of the time. Since it is now relatively easier to reject the null hypothesis, though, the probability of a type II error is reduced.
Under the classical approach to hypothesis testing, the following procedure is used:%
\exbox{Prescription Drug Approvals}{
As noted by Miller, Benjamin and North (1996), the Food and Drug Administration (FDA) is charged with the task of determining whether newly developed prescription drugs may be offered for sale in the United States. Under the current requirements, pharmaceutical companies are required to provide compelling evidence of the safety and effectiveness of proposed drugs.
This approval process contains the risk of both
type I and type II errors. A type I error is committed when an unsafe drug is released to the public. The approval of thalidomide (a sleeping pill that caused a large number of birth defects when used by pregnant women) serves as an example of a type I error. Type II errors, on the other hand, occur when safe drugs are not released to the public. When a more stringent testing standard is imposed type I errors become less likely, while the probability of type II errors increases.
In the case of AZT (and other drugs used in AIDS treatment), the FDA has provided a streamlined approval process since the relative cost of type II errors is higher in the case of medicines that show promise of treating a disease with such a high fatality rate.}\label{med.type.errors}
\begin{enumerate}
\item The investigator selects the significance level that is to be used.
The choice of this significance level should be based on the relative cost of type I and type II errors. A lower significance level (such as .01) should be used if the cost of a type I error is relatively high compared to the cost of a type II error. If the cost of a type I error is relatively low, then a higher significance level (such as .05 or 0.1) can be used.
\item A test is selected that is as powerful as possible given the significance level. In other words, the researcher is charged with finding a test that offers the lowest probability of a type II error for the given probability of a type I error.\label{t1&2.end}
\end{enumerate}
\subsection{Two-tailed tests}
\subsubsection{A test of the hypothesis: $\protect\beta _{j}=c$} Suppose that you wish to test the null hypothesis that states that $\beta _{j}=c$ (where $c$ is a constant). This test may be applied to either the intercept or slope parameter. The null and alternative hypotheses for this test are given by:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}=c
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{j}\neq c
\end{equation*}
As noted above, one-tailed tests are typically constructed so that the alternative hypothesis embodies the result that the researcher wishes to demonstrate. In the case of two-tailed tests, however, the standard practice involves specifying the null hypothesis so that it involves an equality relationship. The alternative hypothesis is formulated as the corresponding inequality. If the null hypothesis cannot be rejected at a given significance level, then it is said that $\beta _{j}$ is not significantly different than $c$. If the null hypothesis is rejected, then $\beta _{j}$ is said to be significantly different than $c$ (at the given significance level) The most commonly used method of testing hypotheses such as these involves the use of the $t$-ratio discussed in section \ref{uvtr}.\footnote{% Note that the $\hat{\beta}_{j}$’s follow a $t$-distribution rather than a normal distribution because $\sigma ^{2}$ is not known.} Recall that the statistic:
\begin{equation*}
t=\frac{\hat{\beta}_{j}-\beta _{j}}{\hat{\sigma}_{\hat{\beta}_{j}}} \end{equation*}%
is distributed as a $t$-statistic with $N-2$ degrees of freedom. If the null hypothesis is correct, the true value of $\beta _{j}$ is $c$. Thus, under the null hypothesis, the $t$-statistic is given by:
\begin{equation*}
t=\frac{\hat{\beta}_{j}-c}{\hat{\sigma}_{\hat{\beta}_{j}}} \end{equation*}%
Figure~\ref{ttt_bhc}, contains a diagram illustrating the distribution of this estimator under the null hypothesis.
\begin{center}
\FRAME{ftbpFU}{4.8974in}{2.3004in}{0pt}{\Qcb{Two-tailed test}}{\Qlb{ttt_bhc}% }{fig5-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.8974in;height 2.3004in;depth 0pt;original-width 4.8438in;original-height 2.2606in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-1.gif’;file-properties “XNPEU”;}}
\end{center}
To test the null hypothesis above, we must first select a significance level, $\alpha $. Given this significance level, we can select \textbf{% critical values}, $t_{\alpha /2}$ and $-t_{\alpha /2}$ so that the probability of an outcome that lies either above $t_{\alpha /2}$ or below $% -t_{\alpha /2}$ equals $\alpha $ (the probability of a value of an estimator that lies in either one of the tails of the distribution equals $\alpha /2$% ). If the estimated $t$-ratio falls within the interval that lies between -$% t_{\alpha /2}$ and $t_{\alpha /2}$, you cannot reject the null hypothesis.
If the value of the $t$-ratio falls either below $-t_{\alpha /2}$ or above $% t_{\alpha /2}$, you can reject the null hypothesis and accept the alternative hypothesis. If the null hypothesis is correct, the probability of a type I error equals $\alpha $.
Figure~\ref{3ttt_bhc} illustrates the acceptance and rejection regions for a two-tailed hypothesis test at significance levels of .10, .05, and .01.
Let’s examine how the critical values of the $t$-ratio are determined.
Assume that the degrees of freedom equals 60. If we wish to test a hypothesis at the 10\% significance level, we need to find the critical values for the $t$-ratio that result in 5\% of the distribution lying in each tail. Using the $t$-table at the end of this text, we find that this occurs for the values of 1.697 and -1.697. We find the critical values for the .05 and .01 significance levels in the same manner. As Figure~\ref% {3ttt_bhc} illustrates, the choice of a smaller value for $\alpha $ results in a larger acceptance region and a smaller rejection region.
\begin{center}
\FRAME{ftbpFU}{3.2413in}{4.2194in}{0pt}{\Qcb{A comparison of the acceptance and rejection regions under 0.10, 0.05, and 0.01 significance levels (d.f. = 60)}}{\Qlb{3ttt_bhc}}{fig5-2.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 3.2413in;height 4.2194in;depth 0pt;original-width 5.7501in;original-height 7.4996in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-2.gif’;file-properties “XNPEU”;}}
\end{center}
\subsubsection{A test of the hypothesis: $\protect\beta _{j}=0$} The most common application of a two-tailed tests involves testing to determine whether a coefficient is significantly different than zero. In this case, the appropriate hypotheses are:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}=0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{j}\neq 0
\end{equation*}
In this case, the $t$-ratio reduces to:
\begin{equation*}
t=\frac{\hat{\beta}_{j}}{\hat{\sigma}_{\hat{\beta}_{j}}} \end{equation*}
Since this is a special case of the two-tailed test, hypothesis testing proceeds in the manner described above. If the null hypothesis is rejected, then we can say that the variable is significantly different than zero at the specified significance level.
\subsubsection{Example: Voting behavior and economic conditions\label% {vote.beh.sec}}
Let’s examine how a two-tailed hypothesis test may be applied. Suppose that a political scientist wishes to investigate whether economic conditions affected voting outcomes in the 1992 U.S. Presidential election. Using data from 50 U.S. states and the District of Columbia, she estimates the parameters of the regression equation given by:\footnote{% The data used in this regression appears in the file “election.dat.”} \begin{equation}
\text{DVOTE}_{i}=\beta _{o}+\beta _{1}\text{UN}_{i}+u_{i} \label{dvote.mod} \end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{DVOTE}_{i}=\text{ proportion of state vote cast for the Democratic Party candidate} \\
& \text{UN}_{i}=\text{ statewide unemployment rate in state }i% \end{array}%
\end{equation*}
When the parameters of equation \ref{dvote.mod} are estimated, the resulting equation is:\footnote{%
Note that the standard errors for the slope coefficient are presented by including them under the estimated regression coefficients. This is a common method of presenting regression output. An alternative procedure is to include the estimated $t$-ratios under the estimated coefficients.} \begin{equation}
\widehat{\text{DVOTE}}_{i}=\underset{(4.93)}{28.12}+\underset{(0.71)}{2.04}% \text{UN}_{i} \label{dvote.pred}
\end{equation}
\begin{equation*}
\text{(standard errors in parentheses)}
\end{equation*}
The standard error of the estimated slope coefficient, $\hat{\beta}_{1}$, equals 0.71 in this case. Since there were a total of 51 observations, the degrees of freedom for the slope estimator is 49 (= $N-2$).
This researcher wishes to determine whether the unemployment rate has a significant effect on voting outcomes. The hypotheses involved in this test are:\footnote{%
A careful reader will note that a one-tailed test would be more appropriate if the investigator wished to test whether the incumbent is harmed by adverse economic conditions. For the purpose of exposition, however, we will assume that the researcher was not able to predict the sign of $\beta _{1}$.
Thus, a two-tailed test will be used for this hypothesis..} \begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}=0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{j}\neq 0
\end{equation*}
To conduct this test, we must select a significance level. In this example, we will use a significance level of .01. At a significance level of 0.01, the critical values for a $t$-statistic with 49 degrees of freedom are equal to -2.68 and 2.68 (these values correspond to -$t_{\alpha /2}$ and $% t_{\alpha /2}$ in the discussion above).\footnote{%
Note that the $t$-table does not contain information on the critical values for the $t$-distribution when the degrees of freedom equal 49. We do know, however, that the the critical values are 2.704 and 2.660 when the degrees of freedom are either 40 or 60. Thus, we know that the appropriate critical value lies between 2.704 and 2.660. There are several approaches that are used to deal with cases such as this. A conservative approach is to use the higher critical value of 2.704. This guarantees that the probability of a type I error will be less than 0.01. An alternative solution is to construct a linear interpolation between these two values. This result suggests that the appropriate $t$-value is approximately equal to 2.68.
\par
A better procedure, though, is to use an econometrics regression package or an online statistical calculator to determine the appropriate critical value for the $t$-statistic. The web site that accompanies this text contains links to a variety of online statistical calculators. When this technique is used, the critical value is equal to 2.68.} Thus, we would: \begin{itemize}
\item reject the null hypothesis if the estimated $t$-ratio is either less than -2.68 or greater than 2.68.
\item fail to reject the null hypothesis if the estimated $t$-ratio is greater than -2.68 or less than 2.68.
\end{itemize}
The rejection and acceptance regions for this hypothesis test are illustrated in Figure~\ref{elect_graph.bhc}.
\begin{center}
\FRAME{ftbpFU}{5.118in}{2.0496in}{0pt}{\Qcb{Hypothesis test – Election model}% }{\Qlb{elect_graph.bhc}}{fig5-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.118in;height 2.0496in;depth 0pt;original-width 5.0626in;original-height 2.0107in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-3.gif’;file-properties “XNPEU”;}} \end{center}
The estimated $t$-statistic in this case equals:
\begin{equation*}
t=\frac{\hat{\beta}_{1}}{\hat{\sigma}_{\hat{\beta}_{1}}}=\frac{2.04}{0.71}% \approx 2.9
\end{equation*}
Since this estimated $t$-ratio falls in the rejection region, we can reject the null hypothesis that states that $\beta _{1}$ equals zero. Thus, at a 0.01 significance level, the political scientist can claim that statewide unemployment rates has a statistically significant effect on the proportion of the vote cast for the Democratic Presidential candidate in this election.
\subsection{One-tailed tests}
Two-tailed tests are appropriate when economic theory is not able to predict the sign of a coefficient. In many cases, however, we are interested in determining whether a variable has a particular sign. For example, if we are estimating the parameters of a demand curve, we are not only interested in whether the slope is significantly different from zero. Instead, we would like to determine whether the slope of the demand curve is negative (as predicted by economic theory).
Let’s examine how we can test to determine whether a demand curve is downward sloping. Once again, we start by specifying a null hypothesis that states the opposite of what we are attempting to demonstrate: \begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}\geq 0
\end{equation*}%
The alternative hypothesis in this case is:
\begin{equation*}
\text{H}_{1}\text{: }\beta _{j}<0
\end{equation*}%
In this case, the null hypothesis does not specify a specific value of $% \beta _{j}$. Instead, it allows $\beta _{j}$ to be any value that is greater than or equal to zero. This seems to present a problem since there is no well-defined value of $\beta _{j}$ that we can use to formulate a $t$-ratio.
Fortunately, a convenient solution is available. Suppose that we formulate a $t$-ratio based upon the assumption that $\beta _{j}=0$. If we can reject the null hypothesis that states that $\beta _{j}$ is zero, then we can always reject it for assumed values of $\beta _{j}$ that are greater than zero. The $t$-ratio used for this hypothesis is given by: \begin{equation*}
t=\frac{\hat{\beta}_{j}}{\hat{\sigma}_{\hat{\beta}_{j}}} \end{equation*}%
Figure~\ref{ott_bhc} illustrates the distribution of our estimator based upon the assumption that the true value of $\beta _{j}$ is zero. In this case, however, a large positive value for our estimator is consistent with our null hypothesis. Thus, the null hypothesis would be rejected only in the case of a negative outcome.
\begin{center}
\FRAME{ftbpFU}{4.8352in}{2.3004in}{0pt}{\Qcb{A test of the hypothesis:\ $% \protect\beta _{j}\geq 0$}}{\Qlb{ott_bhc}}{fig5-4.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.8352in;height 2.3004in;depth 0pt;original-width 4.7816in;original-height 2.2606in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘GRAPHS/Fig5-4.gif’;file-properties “XNPEU”;}}
\end{center}
Specifically, to determine the rejection region, we must find the value of the $t$-ratio at which the probability of a type I error equals $\alpha $.
In Figure~\ref{ott_bhc}, this value is labelled as $t_{\alpha }$. The null hypothesis would be rejected only if the estimated $t$-ratio is less than $% t_{\alpha }$. Figure~\ref{ott_bhc} illustrates the acceptance and rejection regions corresponding to a hypothesis test with a significance level of $% \alpha $.
Under a one-tailed test of this sort, the probability of a type I error equals $\alpha $ if the true value of $\beta _{j}$ equals zero. If the true value of $\beta _{j}$ is greater than zero, then the probability of a type I error is less than $\alpha $. Thus, when a one-tailed test is used, the significance level of the test is actually the upper bound of the probability of a type I error.
If the null hypothesis is rejected in the hypothesis test above, then we can say that $\beta _{j}$ is significantly less than zero (at a significance level of $\alpha $). A similar procedure can be used to determine whether a coefficient is significantly greater than zero. In this case, the appropriate hypotheses are:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}\leq 0
\end{equation*}%
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{j}>0
\end{equation*}%
Figure~\ref{oott_bhc} illustrates the acceptance and rejection regions for a hypothesis test of this sort. If you are able to reject the null hypothesis, then you can say that $\beta _{j}$ is significantly greater than zero at a significance level of $\alpha $.
\begin{center}
\FRAME{ftbpFU}{4.7937in}{2.29in}{0pt}{\Qcb{A test of the hypothesis: $% \protect\beta _{j}\leq 0$}}{\Qlb{oott_bhc}}{fig5-5.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.7937in;height 2.29in;depth 0pt;original-width 4.7392in;original-height 2.2502in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘GRAPHS/Fig5-5.gif’;file-properties “XNPEU”;}}
\end{center}
More generally, you may wish to test null hypotheses of the form: \begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}\leq c
\end{equation*}
or
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{j}\geq c
\end{equation*}
In this more general specification, the appropriate test statistic is: \begin{equation*}
t=\frac{\hat{\beta}_{j}-c}{\hat{\sigma}_{\hat{\beta}_{j}}} \end{equation*}
The procedure for testing either of these hypotheses is essentially equivalent to that described above.%
\exbox{The Transition from Socialism to Capitalism}{ Beginning in the 1980s, the former Soviet Republics, and other Eastern European economies began a process of transition from centrally planned economies to market-based economies. The rate of transition, and the specific nature of the market reforms, however, have differed substantially across countries.
As part of an investigation of the effect of these reforms, Sachs (1996) used data from 25 formerly socialist economis to estimate a bivariate regression model given by: $$
\begin{array}{rll}
\text{Growth}_i= & -18.80 + & 0.77 \text{IRP}_i \\
&(-4.97) & (4.84) \\
\end{array}
$$
$$
\text{(}t\text{-statistics in parentheses)}
$$
$$
\begin{array}{ll}
\text{where:} & \text{Growth}_i \text{= Change in GDP for country } i \text{ in 1995} \\ & \text{IRP = index of reform progress} \\
\end{array}
$$
The IRP index is defined as the sum of nine
subindexes that measure the extent of reform in terms of the level of privatization, price liberalization, banking and capital market reform, and legal reforms. Since each of the subindexes range in value from 1 to 5, the IRP variable ranges from a minimum score of 9 to a maximum score of 45. A higher value for the IRP is associated with a higher level of market reform.
The positive slope coefficient indicates that a higher rate of growth has occurred in those economies that have adopted a more extensive mix of economic reforms.
(Note that the $t$-statistic for this slope coefficient is significant at all conventional significance levels.)
}
\subsubsection{Example: Consumption function}
In Chapter \ref{biv.reg.chap}, the parameters of a simple Keynesian consumption function were estimated using 36 years of data. The estimated equation is:
\begin{equation*}
\hat{C}_{t}=-252.3+0.9593YD_{t}
\end{equation*}
(The estimated standard error for the slope parameter is 0.01215.) Economic theory predicts that the slope of this equation (the MPC) will be greater than zero. To test this hypothesis, we specify the hypotheses: \begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}\leq 0
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}>0
\end{equation*}
Suppose that we wish to test this hypothesis at a .05 significance level.
Since this test is a one-tailed test, we need to find the value of the $t$% -ratio that results in 5\% of the total area under the PDF lying above this critical value. Using a $t$-table (or a software package that computes critical values), we find that when there are 34 degrees of freedom, the critical $t$-value is approximately equal to 1.69 (for a one-tailed test).
Thus, we will reject the null hypothesis if and only if the estimated $t$% -ratio is greater than 1.69. Using the information above, we can compute the $t$-ratio:
\begin{equation*}
t=\frac{\hat{\beta}_{1}}{\hat{\sigma}_{\hat{\beta}_{1}}} \end{equation*}
\begin{equation*}
=\frac{0.9593}{0.01215}\approx 79
\end{equation*}
Since this estimated $t$-ratio lies above the critical value of 1.69, we can reject the null hypothesis. Thus, we can say that the MPC is significantly greater than zero at a .05 significance level.
Economic theory also predicts that the MPC is less than one. Suppose that we wished to test this hypothesis at the .05 significance level. In this case, the appropriate hypotheses are:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}\geq 1
\end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}<1
\end{equation*}
In this case, the null hypothesis is rejected only in the negative tail of the distribution. With 34 degrees of freedom, the critical value for the $t$% -ratio is -1.69. We would reject the null hypothesis and accept the alternative hypothesis if the estimated $t$-ratio is less than -1.69.
In this case, the $t$-ratio equals:
\begin{equation*}
t=\frac{\hat{\beta}_{1}-1}{\hat{\sigma}_{\hat{\beta}_{1}}} \end{equation*}
\begin{equation*}
=\frac{0.9593-1}{0.01215}=-3.35
\end{equation*}
Since this value is less than -1.69, we can reject the null hypothesis.
Thus, we can claim that our estimated MPC is significantly less than 1 (at a 5\% significance level).
\subsection{P-values}
In recent years, most econometrics software packages have begun providing a statistic called a \textbf{p-value}. This p-value is a measure of the exact significance level associated with a test of the null hypothesis that states that $\beta _{j}$ = 0.\footnote{%
This is based on the assumption that the p-value is computed for a two-tailed hypothesis test. This seems to be the standard in most econometrics software packages. Some software packages, however, may compute the p-value based on a one-tailed null hypothesis. Be sure to verify whether a one-tailed or a two-tailed p-value is stated.} A p-value equal to 0.024 indicates that there is a 2.4\% probability of committing a type I error if you were to claim that the coefficient is significantly different than zero.
The concept of a p-value is illustrated in Figure~\ref{pvaluegraph_bhc}.
\begin{center}
\FRAME{ftbpFU}{4.804in}{2.0384in}{0pt}{\Qcb{P-value}}{\Qlb{pvaluegraph_bhc}}{% fig5-6.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.804in;height 2.0384in;depth 0pt;original-width 4.7504in;original-height 2.0003in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-6.gif’;file-properties “XNPEU”;}}
\end{center}
The p-value provides a convenient tool that allows you to quickly determine (without the use of a $t$-table) whether an estimated coefficient is significantly different from zero at any significance level. Suppose, for example that you are testing the hypothesis that states that: \begin{equation*}
\text{H}_o\text{: }\beta _1=0
\end{equation*}
If you set the significance level of your test at $\alpha =0.05$, this indicates that you are willing to accept, at most, a 5\% probability of a type I error. You should reject the null hypothesis as long as the estimated probability of a type I error (the p-value) is less than 5\%. The null hypothesis will not be rejected if the p-value exceeds the predetermined significance level.
The two-tailed p-value can also be easily adapted for one-tailed hypothesis tests.\label{p.val.1t.test} As Figure~\ref{pvaluegraph_bhc} indicates, the area in each tail of the distribution equals one-half of the p-value. Thus, the p-value for a one-tailed test is simply one-half of the p-value reported for two-tailed tests. To show that a variable is significantly positive (negative), it is simply necessary to show that twice the p-value is less than the significance level for the test (assuming that the estimated sign agrees with the sign predicted by the model).
Traditionally, econometric studies reported the estimated standard errors and/or $t$-ratios associated with the estimated intercept and slope parameters. In recent years, however, many econometric studies have also begun to report the p-values associated with hypothesis tests. The advantage of this approach is that it allows readers to form their own judgements concerning whether a particular parameter should be considered statistically significant. As noted above, the appropriate significance level for a given hypothesis test should be based on an evaluation of the tradeoff between the costs of type I and type II errors. Individuals may differ concerning the standard of evidence that should be required for a particular hypothesis test. When p-values are reported, the reader is able to evaluate the probability of a type I error without referring to a $t$-table.
\subsubsection{Example:\ p-values for the voting behavior model} Let’s use the voting model introduced in section \ref{vote.beh.sec} to illustrate the use of p-values for hypothesis testing. Table \ref% {pres_elect.results} contains a sample listing of a portion of the output that you might receive from an econometrics software package. The first column contains the names of the variables used in the regression equation (in this case, just an intercept term and the unemployment rate variable).
Estimated coefficients, standard errors, and $t$-ratios for these estimators appear in the second, third and fourth columns. The p-values for these estimates appear in the final column of this table. As noted above, these p-values provide a measure of the exact significance level associated with a test of the null hypothesis that states that $\beta _{j}$ equals 0.
In Table \ref{pres_elect.results}, the econometric software package reports a p-value of 0.00000. This does not, of course, mean that the probability of a type I error is zero. Instead, it indicates that the p-value is less than 0.00001. The interpretation of this, of course, is that the intercept term is significantly different than zero at all conventional significance levels (and would be significantly different than zero even at a 0.00001 significance level!). The p-value of 0.00568 for the unemployment rate variable indicates that the slope of this equation is significantly different than zero at a 0.01 significance level.
\begin{center}
\begin{table}[tbp]
\centering
\begin{tabular}{lcccc}
\hline
& \textbf{Estimated} & \textbf{Standard} & & \\
\textbf{Variable} & \textbf{Coefficient} & \textbf{Error} & {\boldmath$t$}% \textbf{-ratio} & \textbf{p-value} \\ \hline
intercept & 28.12 & 4.93 & 5.7 & 0.00000 \\
UN & 2.04 & 0.71 & 2.9 & 0.00568 \\ \hline
\end{tabular}%
\caption{Estimated coefficients, standard errors, and p-values for the 1992 U.S. Presidential }
\label{pres_elect.results}
\end{table}
\end{center}
\subsection{Confidence intervals for parameter estimates} As noted in Chapter \ref{intro.chap}, when we estimate the parameters of regression models, we need some measure of the reliability of the estimates.
The classical approach to hypothesis testing described above provides one such approach that allows us to determine whether a value is significantly less than or greater than particular threshold values. An alternative approach involves the use of \textbf{confidence intervals}. A confidence interval provides us with an interval in which the true parameter value is likely to fall. When you hear pollsters predict that a particular candidate will receive 52\% of the vote with a margin of error of 2\%, they are establishing a confidence interval for the estimate. What the pollsters mean
by such a statement is that estimators constructed in this manner will fall within two percentage points of the true value with a given probability (typically 95\% or 99\%). Let’s examine how confidence intervals can be constructed.
Suppose that we wished to construct a (1-$\alpha $) confidence interval for our estimate of a slope parameter, $\beta _1$. From the discussion above, we know that critical values of the $t$-ratio (at the appropriate degrees of freedom) can be determined so that:
\begin{equation*}
\text{Prob(}-t_{.025}\leq t\leq t_{.025}\text{)}=.95 \end{equation*}
Substituting in the formula for the $t$-ratio, this becomes: \begin{equation*}
\text{Prob(}-t_{.025}\leq \frac{\hat{\beta}_{1}-\beta _{1}}{\hat{\sigma}_{% \hat{\beta}_{1}}}\leq t_{.025}\text{)}=.95 \end{equation*}
With a little bit of algebraic manipulation, this can be restated as: \begin{equation*}
\text{Prob(}\hat{\beta}_{1}-t_{.025}\hat{\sigma}_{\hat{\beta}_{1}}\leq \beta _{1}\leq \hat{\beta}_{1}+t_{.025}\hat{\sigma}_{\hat{\beta}_{1}}\text{)}=.95 \end{equation*}
Thus, there is a 95\% probability that the interval $\hat{\beta}_{1}-t_{.025}% \hat{\sigma}_{\hat{\beta}_{1}}$ and $\hat{\beta}_{1}+t_{.025}\hat{\sigma}_{% \hat{\beta}_{1}}$(and other similarly constructed confidence intervals) will contain the true value of $\beta _{1}$.\footnote{% More precisely, we should say that in repeated applications of this procedure, the true parameter value will fall in this interval 95 percent of the time. In any given case, the true value of $\beta _{1}$ either falls in this interval or lies outside of this interval. Thus, the probability of this event occurring is either one or zero in a given application of this procedure.} We can write this more succinctly by noting that a 95\% confidence interval for $\beta _{1}$ consists of the interval: \begin{equation*}
\hat{\beta}_{1}\pm t_{.025}\hat{\sigma}_{\hat{\beta}_{1}} \end{equation*}
More generally, it can easily be demonstrated that a 100(1-$\alpha $) percent confidence interval for either $\beta _{1}$ or $\beta _{2}$ is given by:
\begin{equation*}
\text{100(1-}\alpha \text{) percent confidence interval:} \end{equation*}
\begin{equation*}
\hat{\beta}_{j}\pm t_{\alpha /2}\hat{\sigma}_{\hat{\beta}_{j}} \end{equation*}
Suppose that we wish to construct a 95\% confidence interval for the estimated marginal propensity to consume. In the case of the consumption function results presented in Chapter \ref{biv.reg.chap}, the estimated value of the MPC, $\hat{\beta}_{1}$, equals $0.9351$ while the standard error of this estimator, $\hat{\sigma}_{\hat{\beta}_{1}}$, equals $0.01215$.
The critical value for the $t$-statistic for a 95\% confidence interval is equivalent to the $t$-value used for a 5\% significance level. Since there are 34 degrees of freedom, this critical value is approximately equal to: $% t_{.025}(34)=2.03$. Thus,a 95\% confidence interval for the marginal propensity to consume ($\beta _{1}$) can be constructed as: \begin{equation*}
95\%\text{ confidence interval for }\beta _{1}\text{ =} \end{equation*}%
\begin{equation*}
\hat{\beta}_{1}\pm t_{.025}\hat{\sigma}_{\hat{\beta}_{1}} \end{equation*}%
\begin{equation*}
=0.9593\pm (2.03)(0.01215)
\end{equation*}%
\begin{equation*}
=0.9593\pm 0.0247
\end{equation*}%
Thus, a 95\% confidence interval for the MPC ranges from 0.9346 to 0.984.
\subsection{Confidence intervals for forecasts} In a similar manner, we can generate confidence intervals for forecasts based upon regression analysis. As noted in Chapter \ref{biv.reg.chap}, the estimated variance of the prediction is: \begin{equation}
\hat{\sigma}_{p}^{2}=\hat{\sigma}^{2}\left[ 1+\frac{1}{N}+\frac{\left( X_{N+1}-\overline{X}\right) ^{2}}{\sum (X_{i}-\overline{X})^{2}}\right] \label{forecast.variance.bc}
\end{equation}%
When we use the regression equation to generate a predicted value of the dependent value ($\hat{Y}_{N+1}$), a 100(1-$\alpha $) percent confidence interval consists of the interval:
\begin{equation}
\hat{Y}_{N+1}\pm t_{\alpha /2}\hat{\sigma}_{p} \label{forecast.con.int} \end{equation}%
\begin{equation*}
\text{where: }\hat{\sigma}_{p}=\sqrt{\hat{\sigma}_{p}^{2}} \end{equation*}%
This procedure allows us to place confidence intervals around the estimated regression line. As noted in Chapter \ref{biv.reg.chap}, the variance of the prediction is larger for values of $X_{i}$ that lie further from the sample mean ($\overline{X}$). Thus, the confidence intervals are larger for values of $X_{i}$ that are further from the mean. Figure~\ref{conint_graph_bhc} illustrates a 95\% confidence interval for forecasts based upon a regression equation.
\begin{center}
\FRAME{ftbpFU}{4.7833in}{3.4108in}{0pt}{\Qcb{Confidence intervals for forecasts}}{\Qlb{conint_graph_bhc}}{fig5-7.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.7833in;height 3.4108in;depth 0pt;original-width 4.7288in;original-height 3.365in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-7.gif’;file-properties “XNPEU”;}} \end{center}
Suppose, for example, that we wish to forecast the level of aggregate consumption expenditures (in 1996 dollars) that would occur when the level of disposable personal income equals \$7500 billion. The estimated consumption function is:
\begin{equation*}
\text{\^{C}}_{t+1}=-252.3+0.9593\text{YD}_{t+1} \end{equation*}
Thus, the predicted value of consumption expenditures is given by: \begin{equation*}
\text{\^{C}}_{t+1}=-252.3+0.9593(7500) \end{equation*}
\begin{equation*}
=\$6942.45\text{ billion}
\end{equation*}
This result indicates that forecast value of consumption equals \$6942.45 billion when the level of disposable personal income is \$7500 billion.
Once this forecast is determined, the next step is to determine the variance of the forecast using equation \ref{forecast.variance.bc}. Using the information appearing in Table \ref{t3.1}, the variance of the forecast can be computed as:\footnote{%
In addition to the information contained in Table \ref{t3.1}, this computation also requires an estimate of: \begin{equation*}
\hat{\sigma}^{2}=\frac{\sum \hat{u}_{i}^{2}}{N-2} \end{equation*}%
More generally, however, most or all of the information needed to generate this estimated variance is provided by the standard output of most regression packages.}
\begin{equation*}
\hat{\sigma}_{p}^{2}=\hat{\sigma}^{2}\left[ 1+\frac{1}{N}+\frac{\left( X_{N+1}-\overline{X}\right) ^{2}}{\sum (X_{i}-\overline{X})^{2}}\right] \end{equation*}%
\begin{equation*}
=8358.62\left[ 1+\frac{1}{36}+\frac{\left( 7500-4091.3\right) ^{2}}{147286.0}% \right]
\end{equation*}%
\begin{equation*}
=667,993.98
\end{equation*}%
Using equation \ref{forecast.con.int}, a 95\% confidence interval for the forecast value of consumption is then given by: \begin{equation*}
\hat{Y}_{N+1}\pm t_{\alpha /2}\hat{\sigma}_{p} \end{equation*}%
\begin{equation*}
=6942.45\pm (2.03)(817.31)
\end{equation*}%
\begin{equation*}
=6942.45\pm 1659.14
\end{equation*}%
Thus, a 95\% confidence interval for the forecast ranges from 5283.31 to 8601.59.
\subsection{A test of the hypothesis: $\protect\sigma ^2=c\label{sigma.start} $}
On occasion, you may wish to test hypotheses involving the variance of the error term. The statistic:
\begin{equation}
\chi ^{2}=(N-2)\frac{\hat{\sigma}^{2}}{\sigma ^{2}} \label{vestz1} \end{equation}
follows a $\chi ^{2}$-distribution with $N-2$ degrees of freedom.\footnote{% Once again, the degrees of freedom for this estimator equals the number of observations minus the number of estimated parameters (2 in this case).
Thus, the degrees of freedom equals $N-2$.} Suppose that we wished to test the hypotheses:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\sigma ^{2}=c \end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\sigma ^{2}\neq c \end{equation*}
The procedure for testing the null hypothesis proceeds in essentially the same manner as discussed above. Once the significance level is determined, the critical values are determined using a $\chi ^{2}$-table (such as the one appearing at the end of this text). Then, the value of the estimator in equation \ref{vestz1} is computed under the assumption that the null hypothesis is correct. If this estimated statistic falls within the rejection region, then the null hypothesis is rejected and the alternative hypothesis is accepted.
Let’s consider an example. Suppose that you find that the estimated variance is 3.62 in a sample with 32 observations. You wish to test the hypotheses:% \footnote{%
Note that this is a two-tailed hypothesis test since the null hypothesis could be rejected if the variance is either greater than or less than 4. A one-tailed hypothesis test would be used if the null hypothesis contained an inequality such as:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\sigma ^{2}\leq 4 \end{equation*}
or:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\sigma ^{2}\geq 4 \end{equation*}%
}
\begin{equation*}
\text{H}_{\text{o}}\text{: }\sigma ^{2}=4 \end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\sigma ^{2}\neq 4 \end{equation*}
at the .05 significance level.
To conduct a test of this hypothesis, it is necessary to determine the acceptance and rejection regions for the test statistic. As noted in Chapter % \ref{stat.chap}, the $\chi ^{2}$-distribution is a skewed distribution that only takes on nonnegative values. Using Table \ref{chi-table} in Appendix % \ref{stat.tab.app}, it can be seen that there is a 2.5\% probability that a $% \chi ^{2}$-variate with 30 degrees of freedom will take on a value that lies below 16.7908.This table also indicates that there is a 2.5\% probability that this $\chi ^{2}$-variate will lie above 46.9792. Thus, as illustrated in Figure~\ref{chi_graph_bhc}, the null hypothesis above will be rejected if the $\chi ^{2}$-statistic is either less than 16.7908 or greater than 46.9792. The null hypothesis cannot be rejected if the estimated $\chi ^{2}$% -statistic lies between 16.7908 and 46.9762.
\begin{center}
\FRAME{ftbpFU}{4.3742in}{3.5993in}{0pt}{\Qcb{Acceptance and rejection regions for a $\protect\chi ^{2}$ variable}}{\Qlb{chi_graph_bhc}}{fig5-8.gif% }{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.3742in;height 3.5993in;depth 0pt;original-width 4.3232in;original-height 3.5518in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-8.gif’;file-properties “XNPEU”;}} \end{center}
In this particular case, the estimated test statistic can be computed as: \begin{equation*}
\chi ^2=(N-2)\frac{\hat{\sigma}^2}{\sigma ^2} \end{equation*}
\begin{equation*}
=(30)\frac{3.62}4
\end{equation*}
\begin{equation*}
=27.15
\end{equation*}
Since this outcome falls within the acceptance region, the null hypothesis cannot be rejected at the .05 significance level. Thus, we can say that the variance is not significantly different than 4.0 at a 5\% significance level.
One-tailed hypothesis tests proceed in an analogous manner. As in the case of the $t$-tests discussed above, the rejection region for a one-tailed hypothesis test lies on only one side of the distribution.
When is this sort of hypothesis test used? In practice, econometricians rarely conduct hypothesis tests involving the magnitude of $\sigma ^2$.
There are, however, several applications discussed in later chapters in which the $\chi ^2$-distribution is used to conduct other types of hypothesis tests. Thus, the example above should serve to illustrate how the $\chi ^2$-distribution may be used for hypothesis testing.\label{sigma.end} \section{Test of “goodness of fit”}
In the previous chapter, R$^2$ was shown to be a measure of the “goodness of fit” for the regression model. One problem with this measure, however, is that it cannot be used for hypothesis testing purposes. It would be desirable to have a statistic that serves as a test of the overall fit of the regression relationship. A commonly used test of “goodness of fit” is provided by the $F$-statistic defined as:\footnote{% In the more general case of the multiple regression model, the $F$-statistic is defined as:
\begin{equation*}
F=\frac{\text{RSS}/K}{\text{ESS}/(N-(K+1))} \end{equation*}
where $K$ equals the number of variables included as independent variables in the regression equation. The total number of estimated parameters in such a model equals $K+1$ (due to the inclusion of a constant term). This more general form of the $F$-statistic is discussed in more detail in Chapter \ref% {hyp.mult.chap}.}
\begin{equation}
F=\frac{\text{RSS}/1}{\text{ESS}/(N-2)} \label{f-stat.hyp.1} \end{equation}
where:
\begin{equation*}
\text{RSS (regression sum of squares)}=\sum \left( \hat{Y}_i-\overline{Y}% \right) ^2
\end{equation*}
and
\begin{equation*}
\text{ESS (error sum of squares)}=\sum \hat{u}_i^2 \end{equation*}
This $F$-statistic follows an $F$-distribution with $1$ and $N-2$ degrees of freedom in the numerator and denominator, respectively. Note that this $F$% -statistic will tend to be large when the regression equation explains a large proportion of the variance in the dependent variable.
A test of the overall fit of the model is, in the bivariate regression case, equivalent to testing whether the variation in the independent variable is a significant factor in explaining the variation in the dependent variable.
This is equivalent to testing the hypotheses: \begin{equation}
\text{H}_{o}\text{: }\beta _{1}=0 \label{F-test.hypc} \end{equation}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}\neq 0
\end{equation*}
The basic procedure involved in constructing an $F$-test can be summarized as:
\begin{enumerate}
\item Determine the significance level of the test (based on the relative cost of type I and type II errors).
\item Estimate the regression model and compute the $F$-statistic defined in equation \ref{f-stat.hyp.1}. Most regression packages produce this statistic as part of the regression output.
\item Use an $F$-table (such as the one appearing in Table \ref{F-table} in Appendix \ref{stat.tab.app}) to determine the critical value for an $F$% -statistic with $1$ and $N-2$ degrees of freedom at the predetermined significance level.
\item Reject the null hypothesis if the estimated $F$-statistic exceeds the critical value
\end{enumerate}
A careful reader will note that the $F$-test serves as an alternative to the $t$-test in determining whether a slope coefficient is significantly different than zero. In fact, the two tests are equivalent. If the null hypothesis given in equation \ref{F-test.hypc} is rejected under a $t$-test, it will also be rejected under an $F$-test (assuming of course, that the same significance level is chosen). Similarly, if a variable is not significantly different under a $t$-test, it will not be significant under the corresponding $F$-test. The reason for this is fairly simple: in the bivariate regression model, the $F$-statistic given in equation \ref% {f-stat.hyp.1} is equal to the square of the $t$-statistic used to test this hypothesis and the p-value associated with the $F$-statistic is equal to the p-value associated with the $t$-statistic.\footnote{% This relationship between the $t$- and $F$-statistics is demonstrated mathematical appendix at the end of this chapter.} \subsection{Example: 1992 election model} Suppose that we wished to test the overall goodness of fit of the elections model discussed above. The estimated $F$-statistic for this regression equation can be computed as:
\begin{equation}
F=\frac{\text{RSS/1}}{\text{ESS}/(N-2)} \label{fstat.num.dem.bc} \end{equation}%
\begin{equation*}
=\frac{528.76/1}{3095.93/49}
\end{equation*}%
\begin{equation*}
=8.37
\end{equation*}%
As illustrated in Figure~\ref{fgraph_bhc}, if a 1\% critical value is selected, the critical value for an $F$-statistic with 1 and $49$ degrees of freedom is approximately equal to 7.2.\footnote{% Note that the numerator degrees of freedom equals one and the denominator degrees of freedom equals 49 in this case.} Since the estimated $F$% -statistic exceeds this critical value, we may reject the null hypothesis that state unemployment rates have no bearing on U.S. Presidential voting behavior. This implies that the regression model explains a significant portion of the variance in the dependent variable.
\begin{center}
\FRAME{ftbpFU}{4.8352in}{3.5362in}{0pt}{\Qcb{Acceptance and rejection regions for $F$-statistic}}{\Qlb{fgraph_bhc}}{fig5-9.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.8352in;height 3.5362in;depth 0pt;original-width 4.7816in;original-height 3.4895in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig5-9.gif’;file-properties “XNPEU”;}} \end{center}
\section{A test for normality\label{Jarque_Bera}} Since all of the test statistics that we have discussed above rely on the assumption that error terms are normally distributed, it is useful to be able to test this assumption. The Jarque-Bera\footnote{% See Jarque and Bera (1987) for a more complete discussion of this test.} test is often used for this purpose. This test relies on the estimated skewness (S) and kurtosis (K) of the regression residuals. The skewness of a variable is a measure of the degree of asymmetry in the distribution of the variable. A symmetric distribution (such as the normal distribution) has a skewness of zero. Kurtosis is a measure of the thickness of the tails of the distribution. The normal distribution has a kurtosis of 3. A value of the kurtosis that exceeds 3 indicates that the tails of the distribution are thicker than that of a normal density function.
The Jarque-Bera statistic is constructed using the formula: \begin{equation*}
\text{Jarque-Bera statistic = }N\left[ \frac{S^{2}}{6}+\frac{\left( K-3\right) ^{2}}{24}\right]
\end{equation*}
Under the assumption that the error terms are distributed normally, this statistic has a distribution that converges to a $\chi ^{2}$ distribution with 2 degrees of freedom as the sample size approaches infinity. Note that when the error terms are normally distributed, the estimated Jarque-Bera statistic should be close to zero. The Jarque-Bera statistic will be larger when the skewness and the kurtosis of the sample distribution of the error terms differ substantially from those of a normal distribution. The null hypothesis of normality is rejected if the Jarque-Bera statistic exceeds the critical value for a $\chi ^{2}$ distribution with 2 degrees of freedom.
An example may help to illustrate this concept. Consider the voting model discussed above. When the descriptive statistics for the residual for this regression are computed, the estimated skewness is 2.2 and the estimated kurtosis is 13.4. This suggests that the distribution of the error terms is positively skewed \ and that the tails of this distribution are thicker than those of a normal distribution.
Since there are 51 observations, the estimated Jarque-Bera statistic can be computed as:
\begin{equation*}
\text{Jarque-Bera statistic = }51\left[ \frac{(2.2)^{2}}{6}+\frac{\left( 13.4-3\right) ^{2}}{24}\right]
\end{equation*}
\begin{equation*}
=270.98
\end{equation*}
If a 1\% significance level is adopted. the critical value for a $\chi ^{2}$ distribution with 2 degrees of freedom equals 46.96. Since the estimated Jarque-Bera statistic exceeds this critical value, the hypothesis of normally distributed error terms can be rejected in this case.\footnote{% It should be noted, though, that this test is appropriate only in a large sample. Given the relatively small sample size in this case, too much weight should not be placed upon this result.} One problem with this test is that it does not provide a constructive alternative. If the hypothesis of normality is rejected, the test does not directly suggest an alternative to the assumption of normality. If the non-normality of the error terms is due to an incorrect specification of the model, an alternative specification might be considered. This is a topic that will be examined in more detail in later chapters.
As noted above, however, the Gauss-Markov theorem indicates that estimated regression parameters are still BLUE, even if the error terms are not normally distributed. Parameter estimates are still unbiased and have the lowest variance among all linear unbiased estimators even when the error terms are not normally distributed. A finding of non-normality for the residuals in a regression model simply indicates that critical values derived from the $t$, $\chi ^{2}$, and $F$ distributions will not provide exact significance levels. The critical values from these distributions may overstate or understate the appropriate critical values for these test statistics.
A variety of central limit theorems, however, suggest that the estimated slope and intercept parameters will converge to a normal distribution as the size of the sample approaches infinity even when the error terms are not normally distributed. Because of this, it is common practice for econometricians to use the $t$-distribution for tests involving the magnitude of regression parameters even when the error terms do not appear to be normally distributed. The difference is that they interpret these tests as being approximate rather than exact tests. The error in this approximation will, in general, tend to be larger when the sample size is relatively small.
\label{Jarque_Bera.end}
\section{A few words of caution}
\subsection{Correlation vs. Causation} Hypothesis tests are based upon the observed correlations among economic variables. In a regression model, a slope parameter may be statistically significant even if there is no causal relationship between the independent variable and the dependent variable. In particular, this may occur if the assumptions of the classical regression model are violated. For example, let’s examine the regression relationship between the number of deaths in the U.S. population and the number of secondary school teachers. As noted in Chapter \ref{intro.chap}, the estimated relationship is: \begin{equation*}
\text{Deaths}_{t}=1276.739+0.666\text{ Teachers}_{t}\text{ (variables measured in thousands)}
\end{equation*}
Suppose that we wish to test to determine whether there is a statistically significant relationship between the number of secondary school teachers and the number of deaths in the U.S. The null and alternative hypotheses are: \begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}=0 \end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}\neq 0
\end{equation*}
Suppose that we select a .01 significance level for this test. Since 32 observations were used to estimate this model, the degrees of freedom for the estimators is 30 (= $N-2$). The critical values for the $t$-statistic are -2.042 and 2.042 for this problem. Thus, the null hypothesis will be rejected if the absolute value of the $t$-ratio is greater than 2.042.
To test this hypothesis, the $t$-ratio must be computed. Since the standard error of the slope estimate is 0.0547, under the null hypothesis, the $t$% -ratio is:
\begin{equation*}
t=\frac{0.666}{0.0547}
\end{equation*}
\begin{equation*}
=12.17
\end{equation*}
Since this statistic is greater than 2.042, we can reject the null hypothesis and claim that the slope coefficient is significantly different than zero. We can interpret this result as suggesting that the number of high school teachers is a significant determinant of the number of people dying in the U.S.
As this example suggests, we cannot assume that statistically significant regression coefficients imply the existence of a causal relationship between the independent and dependent variables. As noted in Chapter \ref{intro.chap}% , it is likely that the correlation that exists between these two variables is primarily caused by the fact that both variables have increased over time as population growth occurred.
\subsection{Are significant results always important results?} As the size of the sample approaches the size of the population, the standard error of parameter estimates will approach zero. Thus, $t$-ratios tend to increase in size as the sample size rises. Since hypothesis tests are based on the assumption that you are dealing with a “sample” and not a population, $t$ tests will generally indicate that almost any parameter is significantly different than zero when the sample size is sufficiently large. An estimated slope coefficient equal to 0.0001 may be statistically significant in a large sample. In many applications, however, an effect of this magnitude may be so small that, for all practical purposes, it is of little economic interest.\footnote{%
A good discussion of this argument appears in McCloskey(1985), McCloskey and Ziliak (1996) and Goldberger (1991), pp. 240-241.} What matters, in practice, is the magnitude of the effect on the dependent variable induced by a change in the independent variable. To evaluate this effect, it is important to consider the magnitude of the coefficients as well as their statistical significance.\footnote{%
It is also important to know the units of measurement for the dependent and independent variables. As will be shown in Chapter~\ref{func.form.ii.chap}, changes in the units of measurements will alter the magnitude of the estimated slope and intercept coefficients. What ultimately matters is how substantial is the impact of a change in the independent variable on the level of the dependent variable.}
Furthermore, it is important to note that a coefficient possessing a higher $% t$-statistic is not “more significant” than a coefficient with a lower $t$% -statistic. A higher $t$-ratio may provide more confidence in a result, it does not make the result more important. The importance of a result is based on the magnitude of the coefficient (and the units of measurement) as well as its significance level.
\subsection{Are insignificant results always unimportant results?} While hypothesis testing is a large component of econometric analysis, it is only one part. As noted in Chapter \ref{intro.chap}, regression analysis is also used to estimate the magnitude of the relationships that exist among economic variables. Under the assumptions of the classical regression model, regression parameters are unbiased and consistent. While an estimated coefficient may not be significantly different than zero, this does not mean that the coefficient actually equals zero.% \exbox{Publication Bias}{
Economists have long observed that studies are more likely to be published when key coefficients are statistically significant. Under the “publish or perish” environment facing many academics, this often means that econometricians face an incentive to engage in specification searches in which alternative models (or alternative samples) are used until statistically significant results are found.
This process, however, results in a phenomenon known as “publication bias.”
Suppose, for example, that the true value of a slope coefficient is zero. If a 5\% significance level is used, a type I error will be committed 5\% of the time. In practice, this means that the estimated parameter will appear to be significant in approximately 5\% of the samples investigated. If publication bias occurs, a review of the published literature will primarily reveal only the 5\% of the studies in which a significant result is found; the insignificant results are less likely to be published.
Recently, however, several studies have been published that suggest that the minimum wage does not have a significant effect on teenage employment levels. A good discussion of these studies appears in {\it Myth and Measurement: The New Economics of the Minimum Wage} (1995) by David Card and Alan Krueger.
The publication of these studies perhaps signals a growing recognition of the fact that insignificant results are often as interesting as significant results.
}
\subsection{Hypothesis tests and model assumptions} It is also important to remember that all of the hypothesis tests discussed in this chapter are based on the assumptions of the classical regression model. If any of these assumptions is violated, these tests are not appropriate. In later chapters, we will examine the implication of violations of each of these assumptions.
\section{Summary}
In this chapter, we have examined how hypothesis testing may be performed in the bivariate regression model. In particular, we have examined how perform tests involving either the magnitude or sign of regression parameters.
When you use econometric analysis to test economic hypotheses involving regression parameters, you will have to determine whether a one-tailed or a two-tailed hypothesis test is appropriate. The discussion above should suggest a straightforward decision rule: \begin{itemize}
\item If economic theory predicts that a parameter will have a certain sign (or will be either above or below a particular threshold value), then a one-tailed test is appropriate.
\item A two-tailed test is appropriate if economic theory predicts that a coefficient may be either positive or negative (or may be either above or below some threshold value). This test is also appropriate in cases in which there are alternative theories about that generate different predictions about the sign of the coefficient.
\end{itemize}
The $t$-test is used extensively by econometricians to perform one-tailed or two-tailed tests involving the estimated slope and/or intercept parameters.
The $\chi ^2$-test may be used to perform tests concerning the magnitude of the variance of the error terms. An overall test of the “goodness of fit”
of the regression model is provided by an $F$-test.
Each of these tests proceed in the same basic manner: \begin{enumerate}
\item Null and alternative hypotheses are specified.
\item A significance level ($\alpha $) is determined for the test (based upon the cost of type I and type II errors).
\item The distribution of an appropriate test statistic is determined (such as the $t$-ratios or $\chi ^2$-statistics discussed above) under the assumption that the null hypothesis is correct.
\item The acceptance and rejection regions for the test statistic are determined.
\item The test statistic is computed using sample information.
\item The null hypothesis is rejected if the test statistic falls in the rejection region; otherwise, you will “fail to reject” the null hypothesis.
\end{enumerate}
\section{Key Concepts}
central limit theorem
minimum variance unbiased estimators
$t$-ratio
hypothesis testing
null hypothesis
alternative hypothesis
two-tailed test
one-tailed test
type I and type II errors
significance level
confidence level
power of the test
critical values
acceptance region
rejection region
P-value
confidence interval
$\chi ^2$-test
$F$-test
\newpage\
\section{Exercises and problems}
\begin{enumerate}
\item Explain the circumstances under which you would use a one-tailed test.
When is a two-tailed test appropriate?
\item Should a one-tailed or two-tailed test should be used in each of the following cases? Explain.
\begin{enumerate}
\item An economist wishes to determine the effect of income on expenditures at fast-food restaurants.
\item A market researcher wishes to examine the effect of the price of a product on the quantity of the good demanded.
\item An economist wishes to examine the effect of education on earnings.
\item An AIDS researcher wishes to examine the effect of government sponsored education efforts on infection rates.
\item A political scientist wishes to examine the effect of community wealth on local school district expenditures.
\end{enumerate}
\item Fill in the blanks in the table below: \begin{equation*}
\begin{array}{cccc}
\text{Coefficient} & \text{Est. Coeff.} & \text{Standard error} & t\text{% -statistic} \\
\beta _{o} & 1.45 & 0.25 & \_\_\_\_ \\ \beta _{1} & 2.34 & \_\_\_\_ & 3.20%
\end{array}%
\end{equation*}
\item In the lemonade demand curve example discussed in Chapter \ref% {intro.chap}, the estimated equation was: \begin{equation*}
\hat{Q}_{di}=49.2-38\text{Price}_{i}
\end{equation*}
The standard error for the intercept and slope parameters are 1.33 and 1.81, respectively. There were 12 observations. You wish to form a test of the following set of hypotheses:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}\geq 0 \end{equation*}
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}<0
\end{equation*}
\begin{enumerate}
\item Is this a one-tailed or a two-tailed test?
\item Determine the rejection region for the $t$-statistic under a 0.01 significance level.
\item Compute the $t$-statistic that is required to conduct this hypothesis test.
\item Determine whether the null hypothesis can be rejected.
\end{enumerate}
\item Consider the lemonade demand curve example discussed in Chapter \ref% {intro.chap} (the relevant information is summarized in the previous question).
\begin{enumerate}
\item Suppose that you wished to construct a test of involving the intercept parameter $\beta _{o}$. What does economic theory suggest about the value of $\beta _{o}$? Is a one-tailed or two-tailed test most appropriate?
\item Formulate the null and alternative hypotheses under such a test.
\item Determine the rejection region for the $t$-statistic using a 0.05 significance level.
\item Compute the $t$-statistic that is required to conduct this hypothesis test.
\item Determine whether the null hypothesis can be rejected.
\end{enumerate}
\item Consider the lemonade demand curve given by: \begin{equation*}
Q_{di}=\beta _{o}+\beta _{1}\text{Price}_{i}+u_{i} \end{equation*}
\begin{enumerate}
\item Can you predict the sign of $\beta _{1}$? Is a one-tailed or two-tailed hypothesis test appropriate for this parameter? Specify appropriate null and alternative hypotheses.
\item Use the data in the file “lemonade.dat” to estimate the parameters of this demand curve.
\item Conduct a $t$-test of the hypotheses stated in part (a) at a 5\% significance level.
\end{enumerate}
\item Since 1997, New York State has published annual “report cards”
(available on the internet) that provide information on a variety of measures of the “success” of elementary and secondary school systems (as well as other characteristics of these schools). In examining the data for 1996, Riede (1997), a reporter for the Syracuse Newspapers, observed that a relationship appeared to exist between academic performance and the level of student poverty for the 124 schools in the central New York area. To evaluate this relationship, a regression model was formulated as: \begin{equation}
\text{RD}_{i}\text{ = }\beta _{o}+\beta _{1}\text{LUN}_{i} \label{nfree.lunch}
\end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{RD}_{i}\text{ = proportion of 3rd grade students in school }i\text{ achieving a “mastery” level.} \\ & \text{LUN}_{i}\text{ = proportion of students in school }i\text{ eligible for free or reduced-price lunches}%
\end{array}%
\end{equation*}
The estimated equation, however, is not presented in the article.
\begin{enumerate}
\item Can you predict the sign of $\beta _{1}$? State an appropriate hypothesis test for this coefficient.
\item Use the data in Table \ref{schools.dat} in Appendix \ref{data.appendix} (or in the file “schools.dat”) to estimate the parameters of equation \ref% {nfree.lunch}.
\item At a 1\% significance level, test the hypothesis stated in (a).
\end{enumerate}
\item Arthur Okun suggested that the rate of growth in real GNP affects the unemployment rate according to the relationship: \begin{equation}
Y_{t}=\beta _{o}+\beta _{1}X_{t}+u_{t} \label{okuns.prob.bc1} \end{equation}
where $Y_{t}$ is defined as the change in the unemployment rate (measured in percentage points) and $X_{t}$ is the quarterly percentage change in real GNP.
\begin{enumerate}
\item The file \textquotedblleft okun.dat\textquotedblright\ contains 217 quarterly observations on unemployment rates and real GNP (1st quarter, 1948 – 1st quarter, 2002). (This data is described in Table \ref{okuns.law.dat} on p. \pageref{okuns.law.dat} in Appendix \ref{data.appendix}.) Use this data and a computer spreadsheet program or an econometrics software package to create the variables $X_{t}$ and $Y_{t}$ appearing in equation \ref% {okuns.prob.bc1}. The appropriate transformations are: \begin{equation*}
Y_{t}=\text{UN}_{t}-\text{UN}_{t-1}
\end{equation*}%
\begin{equation*}
\text{where UN}_{t}\text{ = unemployment rate in period }t\text{, and } \end{equation*}%
\begin{equation*}
X_{t}=\frac{\text{GNP}_{t}-\text{GNP}_{t-1}}{\text{GNP}_{t-1}}\times 100 \end{equation*}%
\begin{equation*}
\text{where GNP}_{t}\text{ = real GNP in period }t \end{equation*}%
(Note that there are only 216 observations for $Y_{t}$ and $X_{t}$.) Use an OLS regression procedure to estimate the parameters of equation \ref% {okuns.prob.bc1}.
\item Okun’s law suggests that the value of $\beta _{1}$ is approximately equal to 0.3. Perform a test of the hypotheses: \begin{equation*}
\text{H}_{o}\text{: }\beta _{1}=0.3
\end{equation*}%
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}\neq 0.3 \end{equation*}%
at a 5\% significance level. Can you reject the null hypothesis?
\end{enumerate}
\item The file \textquotedblleft cars.dat\textquotedblright\ contains information on a variety of characteristics of automobiles sold in 2002.
Consider the equation given by:%
\begin{equation*}
\text{MSRP}_{i}=\beta _{o}+\beta _{1}\text{Horse}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{MSRP}_{i}=\text{ manufacturer’s suggested retail price for car model }i\text{ in 2002} \\
& \text{Horse}_{i}=\text{ horsepower for car model }i \\ & u_{i}=\text{ random error term for observation }i% \end{array}%
\end{equation*}
\begin{enumerate}
\item Would a one-tailed or two-tailed hypothesis test be appropriate for testing the statistical significance of the estimated parameter $\hat{\beta}% _{1}$? Explain.
\item State the appropriate null and alternative hypotheses that are appropriate for a test involving $\beta _{1}$.
\item Use the data in the file \textquotedblleft cars.dat\textquotedblright\ (this data is described in Table \ref{cars.dat} on p. \pageref{cars.dat}) to estimate the parameters of this equation. At a 5\% significance level, test the hypothesis described in part (b).
\end{enumerate}
\item Consider the relationship given by: \begin{equation*}
\text{Imports}_{t}=\beta _{o}+\beta _{1}\text{YD}_{t}+u_{t} \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where: } & \text{Imports}_{t}\text{ = real U.S. imports in time }t \\ & \text{YD}_{t}\text{ = real U.S. disposable personal income}% \end{array}%
\end{equation*}
\begin{enumerate}
\item What does economic theory suggest about the sign of $\beta _{1}$? Is a one-tailed or a two-tailed hypothesis test most appropriate in this case?
State the null and alternative hypotheses.
\item Use the data in the file \textquotedblleft imports.dat\textquotedblright\ (this data is described in Table \ref% {imports.dat} on p. \pageref{imports.dat}) to estimate the parameters of this equation.
\item At a 5\% significance level conduct a test of the hypothesis embodied in your answer to part (a). Can you reject the null hypothesis?
\end{enumerate}
\item Suppose that a college admissions office wishes to determine the effect of the business cycle on college enrollment rates for high school seniors. The admissions officers are divided concerning the possible effect of a recession on enrollments. Some members of the office believe that enrollment may fall since a recession reduces the ability of households to finance college expenditures. Others argue that enrollment will increase since the job prospects of high school seniors decline during recessions.
\begin{enumerate}
\item Use the data in the file \textquotedblleft enroll.dat\textquotedblright\ to estimate the parameters of an enrollment equation given by:
\begin{equation*}
\text{Enroll}_{t}=\beta _{o}+\beta _{1}\text{UN}_{t}+u_{t} \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \text{Enroll}_{t}\text{ = college enrollment rate in year }t \\
& \text{UN}_{t}\text{ = unemployment rate in year }t% \end{array}%
\end{equation*}
\item Should a one-tailed or two-tailed test be used to examine the effect of a recession on enrollment rates?
\item Specify the null and alternative hypotheses for this test.
\item Test the null hypothesis at a 1\% significance level. What do you conclude?
\item What is the p-value for the test of the null hypothesis specified in (c)? Explain what this p-value represents.
\end{enumerate}
\item Consider the following regression model:% \begin{equation*}
\text{Edspend}_{i}=\beta _{o}+\beta _{1}\text{Income}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\begin{array}{clll}
\text{where:} & \text{Edspend}_{i}\text{ } & = & \text{per student expenditures in elementary and secondary } \\ & & & \text{schools in state }i \\
& \text{Income}_{i}\text{ } & = & \text{\textit{per capita} disposable personal income in state in 2000}%
\end{array}%
\end{equation*}
\begin{enumerate}
\item Would a one-tailed or two-tailed hypothesis test be appropriate in a test involving $\beta _{1}$? Explain. State the appropriate null and alternative hypotheses.
\item Estimate the parameters of this equation using the data contained in the file \textquotedblleft edspend.dat.\textquotedblright\ (A description of the data appears in Table \ref{edspend.dat} on p. \pageref{edspend.dat}.) \item Perform a test of the hypothesis you specified in part (a) at a 5\% significance level.
\end{enumerate}
\item In the case of the inflation / monetary growth model discussed in Chapter \ref{intro.chap}, the estimated regression equation is: \begin{equation*}
\text{INFL}_{i}=-4.859+0.955\ast \text{MGROWTH}_{i} \end{equation*}%
The standard errors for the intercept and slope parameters are 0.98 and 0.18, respectively. 72 observations were used to estimate this equation.
\begin{enumerate}
\item Suppose you wished to test whether the intercept term is significantly different than zero at a 5\% significance level. Is this a one-tailed or a two-tailed test? Determine the acceptance and rejection regions for the appropriate $t$-statistic. Perform a test of this hypothesis.
\item Suppose that you wish to test the monetarist hypothesis that states that:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}=1 \end{equation*}%
Can you reject this null hypothesis at the 0.01 significance level?
\item What is the economic meaning of the null hypothesis in (b)?
\end{enumerate}
\item Use the information in the question above to: \begin{enumerate}
\item construct a 95\% confidence interval for the slope parameter.
\item construct a 99\% confidence interval for the intercept parameter.
\end{enumerate}
\item Can we engage in hypothesis testing through the use of confidence intervals? If so, how would you use a confidence interval to test the null hypothesis:
\begin{equation*}
\text{H}_{\text{o}}\text{: }\beta _{1}=0 \end{equation*}
\item An econometrician in the market research department of a widget manufacturing firm estimates the relationship using 32 observations: \begin{equation*}
\hat{Q}_{i}=\underset{(1.24)}{50.5}+\underset{(2.85)}{8.25}\text{ADS}_{i} \end{equation*}%
\begin{equation*}
(\text{standard errors in parentheses}) \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \hat{Q}_{i}=\text{widget sales in week }i \\ & \text{ADS}_{i}\text{ = number of newspaper ads placed by the firm in week }% i%
\end{array}%
\end{equation*}
\begin{enumerate}
\item Determine whether newspaper advertising has a significant effect on the level of sales at a 5\% significance level.
\item Construct a 95\% confidence interval for the slope parameter.
\end{enumerate}
\item An econometrician estimates the following equation using a sample containing 42 observations: \begin{equation*}
\hat{Y}_{i}=20.23+1.76X_{i} \end{equation*}%
The estimated standard errors for the slope and the intercept are 1.56 and 0.45. The sample mean of $X$ is 34.2, $\hat{\sigma}^{2}=1.24,$ and $\sum \left( X_{i}-\overline{X}\right) ^{2}=15.2$.
\begin{enumerate}
\item Construct 95\% confidence intervals for the estimated intercept and slope parameters.
\item Construct 99\% confidence intervals for the estimated intercept and slope parameters.
\item What is the predicted value of $Y$ when $X$ equals 50? Construct a 95\% confidence interval for this forecast.
\end{enumerate}
\item The estimated variance of the residual in a bivariate regression model ($\hat{\sigma}^{2}$) equals 18.2 in a sample that contains 32 observations.
An investigator wishes to perform a test of the following hypotheses: \begin{equation*}
\text{H}_{\text{o}}\text{: }\sigma ^{2}=25 \end{equation*}%
and
\begin{equation*}
\text{H}_{1}\text{: }\sigma ^{2}\neq 25 \end{equation*}
\begin{enumerate}
\item Construct the appropriate test statistic.
\item Determine the acceptance and rejection regions for this statistic at a 5\% significance level.
\item Can you reject the null hypothesis at a 5\% significance level?
\end{enumerate}
\item The economics department at SUNY-Oswego was interested in determining the relationship that exists between student evaluations of faculty performance and the course grades that students receive. To investigate this issue, data was collected on two variables during the fall semester of 1995: an index derived from student evaluations of the faculty (EVAL) and the average course grade (GPA) assigned to students. A subset of this data appears in the file \textquotedblleft eval.dat.\textquotedblright\ The EVAL variable is measured on a 5 point scale (5 = highest, 1= lowest) and the course GPA is measured on the traditional 4 point scale (F=0, D=1, C=2, B=3, A=4). It is believed that a relationship exists in the form: \begin{equation*}
\text{EVAL}_{i}=\beta _{o}+\beta _{1}\text{GPA}_{i}+u_{i} \end{equation*}
\begin{enumerate}
\item Suppose that the department wished to test whether the instructor’s grading policy affects students’ evaluations of the instructor. Is a one-tailed or two-tailed hypothesis test most appropriate? Specify the appropriate null and alternative hypotheses.
\item Estimate the parameters of this equation using an OLS regression procedure.
\item Perform a $t$-test of the null hypothesis that you specified in part (a) at a 5\% significance level. Be sure to state either the p-value for your test statistic or the critical value for the $t$-statistic as part of your answer.
\end{enumerate}
\item What is the $F$-statistic for the regression results reported in the previous question?
\item In Chapter \ref{biv.reg.chap}, an infant mortality model was specified as:\label{mortal.bc.quest} \begin{equation*}
\text{Mortality}_{i}=\beta _{o}+\beta _{1}\text{Calories}_{i}+u_{i} \end{equation*}%
\begin{equation*}
\begin{array}{cc}
\text{where:} & \text{Mortality}_{i}\text{ = infant mortality rate per 1,000 live births in 1965} \\
& \text{Calories}_{i}\text{ = daily calorie supply per capita in 1965}% \end{array}%
\end{equation*}
\begin{enumerate}
\item Should a one-tailed or two-tailed hypothesis test be used for $\beta _{1}$? State the appropriate null and alternative hypotheses.
\item Use the data in the file \textquotedblleft mortal1.dat\textquotedblright\ to estimate the parameters of this equation.
\item At a 0.01 significance level, test the hypotheses that you proposed in (a).
\end{enumerate}
\item
\begin{enumerate}
\item Estimate the parameters of the regression model given in question \ref% {mortal.bc.quest} and store the residuals.
\item Perform a Jarque-Bera test of normality for the residuals at a 5\% significance level. What do you conclude from this test?
\end{enumerate}
\item Construct an $F$-test (at a 5\% significance level) for the regression model described in the previous question.
\item The file “fac-sal.dat” contains data on the salaries and years of work experience (measured as years since completion of the Ph.D. degree) for 32 economists employed by the University of Michigan for the 1983-4 academic year. (Source: Frank (1984), p. 560.) \begin{enumerate}
\item Use this data to estimate the parameters of the regression model given by:
\begin{equation*}
\text{Salary}_{i}=\beta _{o}+\beta _{1}\text{experience}_{i}+u_{i} \end{equation*}
\item Conduct a test of the hypotheses given by: \begin{equation*}
\text{H}_{o}\text{: }\beta _{1}\leq 0 \end{equation*}
and
\begin{equation*}
\text{H}_{1}\text{: }\beta _{1}>0 \end{equation*}
at the 5\% significance level. What do you conclude?
\item What economic model is being tested in (b)? What is suggested by the outcome of this test?
\item Is the intercept parameter significantly different than zero at a 5\% significance level?
\item Perform an $F$-test of “goodness of fit” at a 5\% significance level. Is this $F$-statistic equal to the squared $t$-statistic from part (a)?
\end{enumerate}
\item The file “hwi.dat” contains data on the help-wanted index and the unemployment rate for 601 monthly observations for the years 1951 through the beginning of 2001.
\begin{enumerate}
\item Consider the regression model given by:\footnote{% This model was estimated using 24 quarterly observations by Gujarati (1968).}% \label{Gujarati.hwi.bc.q} \begin{equation}
\text{HWI}_{t}=\beta _{o}+\beta _{1}\text{UN}_{t}+u_{t}\text{ } \label{nhwi.br}
\end{equation}%
Can the sign of $\beta _{1}$ be predicted? Is a one-tailed or a two-tailed hypothesis test appropriate for estimates of this parameter?
\item Use a statistical software package to estimate the parameters of equation \ref{nhwi.br}.
\item At a 5\% significance level, perform a $t$-test of the hypothesis test discussed in (a).
\item Perform an $F$-test of “goodness of fit” at a 5\% significance level. Is this $F$-statistic equal to the squared $t$-statistic from part (c)?
\end{enumerate}
\item
\begin{enumerate}
\item Estimate the parameters of the regression model appearing in question % \ref{Gujarati.hwi.bc.q} and store the residuals.
\item At a 5\% significance level, test the normality of the residuals in this question. Can you reject the hypothesis of normality?
\end{enumerate}
\end{enumerate}
\newpage\
\section{Mathematical Appendix} \subsection{Equivalence of $t^2$- and $F$-statistics} Consider the hypotheses given by: \begin{equation*}
\text{H}_o\text{: }\beta _1=0 \end{equation*}
and
\begin{equation*}
\text{H}_1\text{: }\beta _1\neq 0 \end{equation*}
The $F$- test for this hypothesis is given by: \begin{equation*}
F=\frac{\text{RSS/1}}{\text{ESS/}(N-2)} \end{equation*}
Using the definitions of RSS\ and ESS, this $F$ statistic may be restated as:
\begin{equation}
F=\frac{\sum \left( \hat{Y}_i-\overline{Y}\right) ^2}{\sum \hat{u}_i^2/(N-2)} \label{fstat.zw.bc}
\end{equation}
Since:
\begin{equation*}
\sum \left( \hat{Y}_i-\overline{Y}\right) ^2=\hat{\beta}_1^2\sum \left( X_i-% \overline{X}\right) ^2
\end{equation*}
and
\begin{equation*}
\sum \hat{u}_i^2/(N-2)=\hat{\sigma}^2 \end{equation*}
equation \ref{fstat.zw.bc} may be restated as: \begin{equation}
F=\frac{\hat{\beta}_1^2\sum \left( X_i-\overline{X}\right) ^2}{\hat{\sigma}^2% } \label{f-stat.2.bhc}
\end{equation}
Noting that the standard error of $\hat{\beta}_1$ is given by: \begin{equation*}
\hat{\sigma}_{\hat{\beta}_1}^2=\frac{\hat{\sigma}^2}{\sum \left( X_i-% \overline{X}\right) ^2}
\end{equation*}
equation \ref{f-stat.2.bhc} can be written in an equivalent form as: \begin{equation*}
F=\frac{\hat{\beta}_1^2}{\hat{\sigma}_{\hat{\beta}_1}^2} \end{equation*}
or
\begin{equation}
F=\left( \frac{\hat{\beta}_1}{\hat{\sigma}_{\hat{\beta}_1}}\right) ^2 \label{f-stat.3.bhc}
\end{equation}
An inspection of equation \ref{f-stat.3.bhc} indicates that this $F$% -statistic is simply the square of the $t$-statistic that would be used to test the hypotheses given above.