Chapter 14 – Limited Dependent Variables
\chapter{ \ Limited Dependent Variables\label{limdep.chap}} \section{Overview}
In previous chapters, it has been assumed that the dependent variable is a continuous random variable. As noted in Chapter \ref{func.form.ii.chap}, however, many economic variables are qualitative. Examples of such variables include: the decision to attend college, marital status, religion, the presence or absence of children, the decision to join a union, and labor force participation status. In Chapter \ref{func.form.ii.chap}, it was shown that qualitative variables of this sort can be represented through the use of dummy variables. This initial discussion was based on the assumption that these dummy variables serve only as independent variables in a regression equation. In many cases, however, econometricians wish to explain the determinants of a qualitative variable. Models in which the observed dependent variable is a dummy variable are referred to as \textbf{binary choice models} since the observed outcome generally reflects a choice between two alternative outcomes.
The use of a qualitative variable as the dependent variable is a special case of a \textbf{limited dependent variable model}. In a limited dependent variable model, there is some limitation on the range of values that may occur for the observed dependent variable. When the dependent variable is a dummy variable, the limitation is obvious: the variable can take on only the values of 1 or 0.
In many situations, however, individuals face a choice among three or more qualitative outcomes. An individual’s choice of the optimal level of educational attainment, for example, involves the selection among a variety of different educational levels. The choice among alternative modes of transportation (such as car, bus, or train) is another example of a choice among multiple qualitative outcomes. Models of choices among more than two alternative are referred to as \textbf{polychotomous choice models}. This chapter contains an introductory discussion of some of the polychotomous choice models that are frequently used by econometricians.
Another interesting limited dependent variable model occurs in a regression model when the level of a dependent variable is observed only for a nonrandom sample of the population. A model of this sort is called a \textbf{% sample selectivity model}. An example should help to illustrate the nature of such a model.
Suppose that an econometrician wishes to examine the determinants of SAT scores. A simple form for this regression model is: \begin{equation}
\text{SAT}_{i}=\beta _{o}+\beta _{1}\text{HSrank}_{i}+\beta _{2}(\text{% Mothers ed.)}_{i} \label{sat.lim}
\end{equation}%
\begin{equation*}
+\beta _{2}(\text{Father’s ed.)}_{i}+u_{i}
\end{equation*}%
In this specification, an individual’s SAT score is assumed to be a function of the individual’s high school class rank and his or her parents’
education. The dependent variable in this model, however, will only be observed for individuals who actually complete the SAT\ exam. The scores are not observed for individuals who do not take this exam. Since those individuals who choose to complete the SAT exam might be those who expect to do relatively well on this exam, the observed sample will not, in general, be a random drawing from the population. In this case, the error terms for the observed sample are likely to have an expected value that is greater than zero, violating one of the conditions of the classical regression model. If the parameters of equation \ref{sat.lim} are estimated using only the observed sample, it is quite possible that the resulting equation will be subject to a sample selectivity bias.
The remainder of this chapter consists of a discussion of the three limited dependent variable models introduced above:
\begin{itemize}
\item binary choice models,
\item polychotomous choice models, and
\item sample selectivity models.
\end{itemize}
\section{Binary choice models}
Suppose that an econometrician wishes to analyze the determinants of a qualitative variable. This qualitative variable generally represents the outcome of a choice made by an economic agent (such as an individual, a firm, or a government agency). Initially, it is assumed that this choice is between two mutually exclusive alternatives. As noted above, the qualitative variable may be represented through the use of a dummy variable, $Y_i$, defined as:
\begin{equation*}
Y_i=1\text{ if the qualitative characteristic is present} \end{equation*}
\begin{equation*}
Y_i=0\text{ if the qualitative characteristic is not present} \end{equation*}
Three alternative models are used by econometricians when the observed dependent variable is a dummy variable:
\begin{itemize}
\item the linear probability model,
\item the probit model, and
\item the logit model.
\end{itemize}
\subsection{Linear probability model}
Under the \textbf{linear probability model, }a simple regression equation is specified of the form:
\begin{equation}
Y_i=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}+u_i \label{lin.prob.lc}
\end{equation}
This equation differs from the multiple regression model discussed in previous chapters in that the dependent variable ($Y_i$) is a dummy variable. Under this specification, the conditional expectation of the dependent variable is given by:
\begin{equation}
E(Y_i|X_{1i},X_{2i},\ldots ,X_{ki})=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki} \label{ey.1.lc} \end{equation}
Using the definition of conditional expectations (discussed in Chapter \ref% {stat.chap}), however, the conditional expectation of $Y_i$ appearing in equation \ref{ey.1.lc} can also be expressed as: \begin{equation}
E(Y_i|X_{1i},X_{2i},\ldots ,X_{ki})=1\times \text{Prob}(Y_i=1|X_{1i},X_{2i},% \ldots ,X_{ki}) \label{ey.lc}
\end{equation}
\begin{equation*}
+0\times \text{Prob}(Y_i=0|X_{1i},X_{2i},\ldots ,X_{ki}) \end{equation*}
\begin{equation*}
=\text{Prob}(Y_i=1|X_{1i},X_{2i},\ldots ,X_{ki}) \end{equation*}
A comparison of equations \ref{ey.1.lc} and \ref{ey.lc} indicates that: \begin{equation}
\text{Prob}(Y_i=1|X_{1i},X_{2i},\ldots ,X_{ki})=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki} \label{prob.eq.lc} \end{equation}
Thus, under the linear probability model, the regression equation provides the conditional probability that $Y_i=1$ as a function of the variables $% X_1,X_2,\ldots ,X_k$. To generate estimates of these probabilities, the parameters of equation \ref{prob.eq.lc} may be estimated using OLS to form: \begin{equation}
\hat{P}_i=\hat{\beta}_o+\hat{\beta}_1X_{1i}+\hat{\beta}_2X_{2i}+\cdots +\hat{% \beta}_kX_{ki} \label{est.prob.lc}
\end{equation}
\begin{equation*}
\text{where: }\hat{P}_i\text{ = Prob(}Y_i=1|X_1,X_2,\ldots ,X_k\text{)} \end{equation*}
\subsubsection{Example: College attendance decision} Let’s examine an application of the linear probability model. Consider a high school student’s decision of whether or not to attend college. In this case, the dependent variable is defined as:
\begin{equation*}
Y_{i}=1\text{ if the individual attends college} \end{equation*}%
\begin{equation*}
Y_{i}=0\text{ if the individual does not attend college} \end{equation*}%
A possible specification of the linear probability model is given by: %TCIMACRO{%
%\TeXButton{tabular}{\begin {table}
%\begin{center}
%\begin{tabular}{|ll|} \hline
%\bf {Variable Names} & \bf {Variable Definitions} \\ \hline %HSRANK & percentile rank in high school class \\ % & \\
%HSLEAD & = 1 if the respondent is an elected or appointed student \\ % & officer in one or more high school club or organization \\ % & ( =0 otherwise) \\
% & \\
%SAT & actual or imputed combined SAT scores \\ % & (SAT verbal + SAT Math) \\
% & \\
%FEMALE & =1 if the respondent is female (= 0 otherwise) \\ % & \\
%MLHS & = 1 if the respondent’s mother did not complete high \\ % & school (= 0 otherwise) \\
% & \\
%MCOL & =1 if the respondent’s mother attended one or more \\ % & years of college (= 0 otherwise) \\ % & \\
%FLHS & = 1 if the respondent’s father did not complete high \\ % & school (= 0 otherwise) \\
% & \\
%FCOL & =1 if the respondent’s father attended one or more \\ % & years of college (= 0 otherwise) \\ % & \\
%NSIB & number of siblings for the respondent \\ % & \\ \hline
%\end{tabular}
%\caption{Variable definitions for educational attainment models\label{vdef.lc}} %\end{center}
%\end{table}} }%
%BeginExpansion
\begin {table}
\begin{center}
\begin{tabular}{|ll|} \hline
\bf {Variable Names} & \bf {Variable Definitions} \\ \hline HSRANK & percentile rank in high school class \\ & \\
HSLEAD & = 1 if the respondent is an elected or appointed student \\ & officer in one or more high school club or organization \\ & ( =0 otherwise) \\
& \\
SAT & actual or imputed combined SAT scores \\ & (SAT verbal + SAT Math) \\
& \\
FEMALE & =1 if the respondent is female (= 0 otherwise) \\ & \\
MLHS & = 1 if the respondent’s mother did not complete high \\ & school (= 0 otherwise) \\
& \\
MCOL & =1 if the respondent’s mother attended one or more \\ & years of college (= 0 otherwise) \\
& \\
FLHS & = 1 if the respondent’s father did not complete high \\ & school (= 0 otherwise) \\
& \\
FCOL & =1 if the respondent’s father attended one or more \\ & years of college (= 0 otherwise) \\
& \\
NSIB & number of siblings for the respondent \\ & \\ \hline
\end{tabular}
\caption{Variable definitions for educational attainment models\label{vdef.lc}} \end{center}
\end{table}
%EndExpansion
\begin{equation}
Y_{i}=\beta _{o}+\beta _{1}\text{HSRANK}_{i}+\beta _{2}\text{HSLEAD}% _{i}+\beta _{3}\text{SAT}_{i} \label{lpm.lc} \end{equation}%
\begin{equation*}
+\beta _{4}\text{FEMALE}_{i}+\beta _{5}\text{MLHS}_{i}+\beta _{6}\text{MCOL}% _{i}
\end{equation*}%
\begin{equation*}
+\beta _{7}\text{FLHS}_{i}+\beta _{8}\text{FCOL}_{i}+\beta _{9}\text{NSIB}% _{i}+u_{i}
\end{equation*}%
The variable definitions for this model are contained in Table \ref{vdef.lc}% . Economic models of education suggest that individuals choose to attend college if the present value of the net benefits associated with this choice is positive. The variables HSRANK, HSLEAD, and SAT are included in this model to account for the effect of individual ability and motivation on the costs and benefits of college. Human capital, signaling, and screening models of educational choice all suggest that more able individuals are more likely to attend college.\footnote{%
See, for example the discussion in Becker (1993), or Spence (1973). A good econometric study of educational choice may be found in Manski and Wise (1983).} Thus, it is expected that the estimated coefficient corresponding to each of these variables will be positive.
The FEMALE variable is included to test for gender differences in the decision to attend college. The parental education variables are included, in part, as a proxy for parents’ income.\footnote{% The parents’ income variable in this data set contains a large amount of missing data. Furthermore, it is also based on student’s statements about their parents’ income. Since this variable contains such a large amount of measurement error, it was felt that it was better to omit this variable and use these \textquotedblleft proxy\textquotedblright\ variables in its place rather than using such a badly measured variable.
\par
Furthermore, the appropriate independent variable is the level of the parents’ \textquotedblleft permanent income.\textquotedblright\ Current income, even if correcty measures, is at best, only a proxy for this unobserved variable. While an instrumentral variable estimation technique could have been used to deal with the measurement error in the parents’
income variable, it is quite likely that parents’ education serves as a reasonably good proxy of the parents’ \textquotedblleft permanent income.”} These variables also partly reflect the effect of childhood investments in human capital (since early language skills are, to a large extent, derived from interactions with parents). Note that the coefficients on the parents’
education dummy variables are interpreted as representing the differential impact of having a parent with either more or less education than a high school diploma (the excluded category for the parental education dummy variables). The NSIB variable is included to account for the effect of family size on parental investments in education. When more children are present in a household, it is likely that there will be less investment in each child.\footnote{%
For a good discussion of this argument, see Willis (1974) or Becker and Lewis (1974).} Thus, it is expected that the coefficient on NSIB will be negative.
Using an OLS estimation procedure, the parameters of equation \ref{lpm.lc} were estimated using a sample of 4781 participants in the \textit{National Longitudinal Study of the High School Class of 1972}.\footnote{% The data used to estimate this equation are contained in the file \textquotedblleft nls72.dat\textquotedblright\ on the data disk that accompanies this text.} The estimated equation is given by: \begin{equation}
Y_{i}=-\underset{(-1.03)}{0.033}+\underset{(10.97\ast \ast )}{0.0033}\text{% HSRANK}_{i}+\underset{(2.97\ast \ast )}{0.040}\text{HSLEAD}_{i}+\underset{% (12.75\ast \ast )}{0.0005}\text{SAT}_{i} \label{lpm1.lc} \end{equation}%
\begin{equation*}
-\underset{(-0.14)}{0.0018}\text{FEMALE}_{i}-\underset{(-2.65\ast \ast )}{% 0.046}\text{MLHS}_{i}+\underset{(4.41\ast \ast )}{0.069}\text{MCOL}_{i} \end{equation*}%
\begin{equation*}
-\underset{(-1.76)}{0.030}\text{FLHS}_{i}+\underset{(5.41\ast \ast )}{0.087}% \text{FCOL}_{i}-\underset{(-2.73\ast \ast )}{0.008}\text{NSIB}_{i} \end{equation*}%
\begin{equation*}
(t\text{-statistics in parentheses)}
\end{equation*}%
\begin{equation*}
^{\ast }\text{significant at a .05 level}
\end{equation*}%
\begin{equation*}
^{\ast \ast }\text{significant at a .01 level} \end{equation*}%
These estimated results are consistent with the predictions. The ability variables are all highly significant and have the expected sign.\footnote{% As noted above, however, the use of $t$-tests based upon the linear probability model is, at best, somewhat questionable. The $t$-ratios are presented for this model for the purpose of comparison with those generated using the more appropriate probit and logit models discussed below.} As anticipated, the children of more highly educated individuals are significantly more likely to attend college. The presence of additional siblings, however, results in a significant reduction in the probability of attending college. In this sample, there is no significant gender difference in the probability of attending college.
The coefficients in this model provide a measure of the change in the probability of attending college as a result of a one-unit increase in each of these variables. In mathematical terms:
\begin{equation*}
\beta _{j}=\frac{\Delta \text{Prob(}Y_{i}=1\text{)}}{\Delta X_{j}} \end{equation*}%
Thus, equation \ref{lpm1.lc} indicates that a one-unit increase in class rank, \textit{ceteris paribus}, will cause the probability of attending college to increase by 0.0033 (= 0.33\%). Therefore, a 10 percentile increase in class rank would result in a 3.3 percentage point increase in the predicted probability of attending college. A 100 point increase in combined SAT scores results in a 5 percentage point increase in the predicted probability of attending college. Having a mother that attended college raises the predicted probability of college attendance by 6.9 percentage points. The presence of a father with a college degree results in an 8.7 percentage point increase in the predicted probability of college attendance.
\subsubsection{Problems with the linear probability model} There are four major problems, however, associated with the use of the linear probability model:
\begin{enumerate}
\item the linear probability model may result in nonsensical estimated probabilities that are either less than zero or greater than one; \item the linear form of this model requires that a one-unit change in the level of an independent variable always results in a constant marginal effect on the probability;
\item the error terms in this model are not normally distributed; and \item the error terms are heteroskedastic (and OLS estimators are inefficient).
\end{enumerate}
The first problem should be fairly obvious: the estimated probabilities ($% \hat{P}_{i}$) may be either less than zero or greater than one for some values of $X_{1},X_{2},\ldots ,X_{k}$. For example, suppose that the following equation is estimated:
\begin{equation*}
\hat{P}_{i}=-.10+0.01X_{1i}
\end{equation*}%
In this case, when the value of $X_{1i}$ is less than 10, the estimated probability will be less than zero. If the value of $X_{1i}$ exceeds 110, the estimated probability will exceed one. Figure~\ref{linprob_g_lim} illustrates this problem.
\begin{center}
\FRAME{ftbpFU}{5.4535in}{3.4947in}{0pt}{\Qcb{Linear probability model}}{\Qlb{% linprob_g_lim}}{fig14-1.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.4535in;height 3.4947in;depth 0pt;original-width 5.3956in;original-height 3.448in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-1.gif’;file-properties “XNPEU”;}} \end{center}
A simple method of dealing with this problem is to constrain the estimated probabilities to fall between zero and one. Under this \textbf{constrained linear probability model }(depicted in Figure~\ref{clinprob_g_lim}) the estimated probabilities are defined as:
\begin{itemize}
\item Prob($Y_{i}=1|X_{1},X_{2},\ldots ,X_{k}$) = 0.001 (or some other arbitrarily small value) if $\hat{P}_{i}<0,$
\item Prob($Y_{i}=1|X_{1},X_{2},\ldots ,X_{k}$) = $\hat{P}_{i}$ if $0<\hat{P}% _{i}<1$, and
\item Prob($Y_{i}=1|X_{1},X_{2},\ldots ,X_{k}$) = 0.999 (or some other value close to 1.0) if $\hat{P}_{i}>1.$
\end{itemize}
\begin{center}
\FRAME{ftbpFU}{4.9182in}{3.1488in}{0pt}{\Qcb{Constrained linear probability model}}{\Qlb{clinprob_g_lim}}{fig14-2.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 4.9182in;height 3.1488in;depth 0pt;original-width 4.8646in;original-height 3.1038in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-2.gif’;file-properties “XNPEU”;}} \end{center}
Of course, this modification is necessary only because the linear probability model may provide estimates of probabilities that are meaningless. This is one of the most troubling problems associated with this model.
The linear form of the linear probability model is also often problematic.
Suppose, for example, that the following linear probability model is specified to explain college attendance:
\begin{equation*}
Y_i=\beta _o+\beta _1\text{Income}_i+u_i
\end{equation*}
where $Y_i$ is a dummy variable that equals one if an individual attends college and equals zero otherwise and Income$_i$ is a measure of family income. Under this specification, a \$1,000 increase in the level of income is assumed to always result in a constant effect on the probability of attending college. If a constrained probability model is used, the effect resulting from an additional \$1,000 in income is assumed to be constant until the upper bound on probability is reached. Beyond this point, the impact of parental income on the probability of college attendance is assumed to be zero. It is much more likely, however, that the effect of additional income will gradually decrease as the level of income rises.
While log and polynomial transformations can be used to partially capture nonlinear effects such as these, it is likely that nonlinearity is the norm rather than the exception in models of this sort.
The nonnormality of the error terms and the heteroskedasticity problem both have a common cause. To see this, note that the error term is defined as: \begin{equation*}
u_i=Y_i-\beta _o-\beta _1X_{1i}-\beta _2X_{2i}-\cdots -\beta _kX_{ki} \end{equation*}
When $Y_i=1,$ this reduces to:
\begin{equation}
u_i=1-\beta _o-\beta _1X_{1i}-\beta _2X_{2i}-\cdots -\beta _kX_{ki} \label{err1.lc}
\end{equation}
For observations corresponding to $Y_i=0$, however, the error term is defined as:
\begin{equation}
u_i=-\beta _o-\beta _1X_{1i}-\beta _2X_{2i}-\cdots -\beta _kX_{ki} \label{err2.lc}
\end{equation}
Since $Y_i$ can take on only these two values, the error term can take on only the two values given in equations \ref{err1.lc} and \ref{err2.lc} (for given values of $X_{1i},X_{2i},\ldots ,X_{ki}$). Thus, the error terms in this model follow a binomial distribution instead of the normal distribution that has been assumed in previous chapters.\footnote{% A discrete random variable follows a binomial distribution if the variable takes on only two alternative values with given probabilities. The probabilities of observing these two values may be expressed as $p$ and $1-p$ (since the sum of the probabilities must equal one). (The outcome resulting from a single toss of a coin is an example of a variable following a binomial distribution.)} Since the error terms are not normally distributed in this model, it is inappropriate to rely on $t$-statistics and $F$% -statistics for hypothesis tests (except in large samples).\footnote{% Even though the error terms are not normally distributed, the central limit theorem suggests that the probability distribution functions for the intercept and slope estimators will still converge to normal distributions as the size of the sample tends toward infinity.} Under the linear probability model, the variance of the error term, $u_i$, is equal to:
\begin{equation*}
var(u_i)=P_i(1-P_i)
\end{equation*}
\begin{equation*}
\text{where: }P_i=\text{ Prob(}Y_i=1)
\end{equation*}
(The derivation of this result is contained in the mathematical appendix located at the end of this chapter.) Using equation \ref{prob.eq.lc}, this variance can be restated as:
\begin{equation*}
var(u_i)=\left( \beta _o+\beta _1X_{1i}+\cdots +\beta _kX_{ki}\right) \left( 1-\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\right) \end{equation*}
Since the variance of the error terms is a function of the variables $X_{1i}$% ,$X_{2i}$,$\ldots $,$X_{ki}$, the error terms are heteroskedastic.
While it is possible to construct a GLS estimator that corrects for the heteroskedasticity that occurs under the linear probability model, the other problems remain. The probit and logit models discussed below, however, are not subject to these problems.
The linear probability model was extensively used until the early 1980s because it could be easily and inexpensively estimated using standard regression software. Advances in computer hardware and econometric software, however, have made logit and probit analysis as accessible to econometricians as the multiple regression model. Current versions of virtually all major econometric packages provide logit and probit estimators. Thus, in recent years, the linear probability model has been effectively replaced by probit and logit models. Let’s examine these alternative models.
\subsection{Probit analysis}
The probit model provides an alternative method of examining an economic agent’s choice between two alternatives ($Y_{i}=0$ and $Y_{i}=1$). The net benefit, $Z_{i}$, associated with choosing the alternative at which $Y_{i}=1$ (as compared to the alternative at which $Y_{i}=0)$ is defined as:\footnote{% The negative sign on the error term is not a typographical error. Instead, this convention is generally followed so that the signs of the estimated coefficients have a somewhat simpler interpretation. By using this convention, a positive sign on a coefficient indicates that an increase in the level of the corresponding variable raises the probability of the event occuring; a negative sign indicates that an increase in the variable lowers the probability of the event occuring. This will become obvious in the discussion that follows.
\par
The variable $Z_{i}$ is commonly referred to as a \textquotedblleft latent variable\textquotedblright\ since it is a structural variable that is not directly observed by the researcher.}
\begin{equation}
Z_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+\cdots +\beta _{k}X_{ki}-u_{i} \label{Z.lc}
\end{equation}%
In equation \ref{Z.lc}, the variables $X_{1i},X_{2i},\ldots ,X_{ki}$ represent a set of $k$ observable variables that affect the costs and/or benefits resulting from this choice. The error term $u_{i}$ captures the effect of any unobservable variables that affect the net benefits resulting from this choice. It is assumed that this error term is distributed normally with a mean of zero and a variance of $\sigma ^{2}$.
If the net benefit ($Z_{i}$) is greater than or equal to zero, the individual will select the choice corresponding to $Y_{i}=1$. On the other hand, if the net benefit is negative, the individual will choose the alternative at which $Y_{i}=0$. For example, suppose that $Z_{i}$ is the net benefit associated with attending college and $Y_{i}$ is a dummy variable that equals one if individual $i$ attends college ($Y_{i}=0$ if individual $% i $ does not attend college). It is expected that those individuals with a positive level of net benefit will choose to attend college while those with a negative level of net benefit will not attend college. Thus, the decision rule can be stated as:
\begin{equation*}
\begin{array}{ll}
\text{Select }Y_{i}=1\text{ if:} & Z_{i}\geq 0 \\ \text{Select }Y_{i}=0\text{ if:} & Z_{i}<0%
\end{array}%
\end{equation*}%
While $Z_{i}$ (the level of net benefit) cannot be observed by the econometrician, the result of this choice process, $Y_{i}$, is observed. In the case of college attendance, an econometrician only observes whether or not the individual attends college ($Y_{i}$), but not the potential level of net benefit ($Z_{i}$) associated with this choice.
In this model, the probability of choosing $Y_i=1$ is given by: \begin{equation*}
\text{Prob}(Y_i=1)=\text{ Prob(}Z_i\geq 0\text{)} \end{equation*}
\begin{equation*}
\text{= Prob(}\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}-u_i\geq 0\text{)}
\end{equation*}
\begin{equation*}
\text{= Prob(}u_i\leq \beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}\text{)}
\end{equation*}
\begin{equation*}
\text{= Prob}\left( \frac{u_i}\sigma \leq \frac{\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}}\sigma \right) \end{equation*}
\begin{equation*}
\text{where: }\sigma \text{ = standard deviation of }u_i \end{equation*}
To simplify this derivation, it is convenient to define $Z_i^{*}$ as: \begin{equation}
Z_i^{*}=\frac{\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}}% \sigma \label{Z.i.lc}
\end{equation}
Using this definition, the probability that $Y_i$ will equal one is given by:
\begin{equation}
\text{Prob}(Y_i=1)=\text{Prob}\left( \frac{u_i}\sigma \leq Z_i^{*}\right) \label{normal.cdf.lc}
\end{equation}
\begin{equation*}
=\Phi (Z_i^{*})
\end{equation*}
\begin{center}
where: $\Phi (\cdot )$ is the cumulative density function for a standard normal variate.\footnote{%
Recall that the cumulative density function, $\Phi ($Z$_i^{*}),$ is defined as the probability of observing an outcome that is less than or equal to $% Z_i^{*}$.}
\FRAME{ftbpFU}{5.3688in}{3.442in}{0pt}{\Qcb{Probit model}}{\Qlb{probit_g_lim}% }{fig14-3.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.3688in;height 3.442in;depth 0pt;original-width 5.3125in;original-height 3.3961in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-3.gif’;file-properties “XNPEU”;}} \end{center}
A graph of the cumulative density function for $Z_{i}^{\ast }$ appears in Figure~\ref{probit_g_lim}.\footnote{%
As discussed in Chapter \ref{stat.chap}, a cumulative density function (CDF) is defined as:
\begin{equation*}
F(x)=\text{Prob(}X\leq x)
\end{equation*}%
The symbol $\Phi $ is generally used to denote the CDF for a standard normal density function.} This diagram illustrates the relationship that exists between the level of $Z_{i}^{\ast }$ and the probability of observing an outcome at which $Y_{i}=1$. Note that the cumulative density function for a standard normal variate tends toward zero as $Z_{i}^{\ast }$ approaches negative infinity and approaches one as $Z_{i}^{\ast }$ tends toward positive infinity. Thus, as this diagram suggests, the probit specification will always generate a predicted probability that lies between zero and one.
Furthermore, under the probit specification, the marginal effect of a change in the level of one of the $X_{j}$’s varies in a reasonable manner with the level of the independent variable. As this diagram indicates, the marginal effect of a change in the level of any one of the independent variables is assumed to gradually diminish as the probability of the outcome approaches either zero or one (since the level of $Z_{i}^{\ast }$ varies with the level of each of the $X_{ji}$’s). A comparison of Figure~\ref{probit_g_lim} with Figure~\ref{linprob_g_lim} serves to illustrate these advantages of the probit model over the linear probability model.
\FRAME{ftbpFU}{5.233in}{1.7461in}{0pt}{\Qcb{Probability of observing $% Z_{i}\geq 0$}}{\Qlb{probit2_g_lim}}{fig14-4.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.233in;height 1.7461in;depth 0pt;original-width 5.1768in;original-height 1.708in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘GRAPHS/Fig14-4.gif’;file-properties “XNPEU”;}} Figure~\ref{probit2_g_lim} provides an alternative representation of the cumulative density function for the standard normal variate $Z_{i}^{\ast }$.
In this diagram, the shaded area to the left of $Z_{i}^{\ast }$ is equal to the probability of observing a value of $Y_{i}$ equal to one. The probability of observing a value of $Y_{i}$ equal to zero must equal: \begin{equation*}
\text{Prob}(Y_{i}=0)=1-\Phi (Z_{i}^{\ast })
\end{equation*}%
This probability is equal to the area under the curve that lies to the right of $Z_{i}^{\ast }$. An inspection of either Figure~\ref{probit_g_lim} or Figure~\ref{probit2_g_lim} indicates that the probability of an outcome at which $Y_{i}=1$ increases when $Z_{i}^{\ast }$ increases. Equation \ref% {Z.i.lc} indicates that an increase in the level of one of the independent variables, $X_{ji}$, will result in an increase in the probability of the event occurring if $\beta _{j}$ is positive. A negative coefficient on a variable, however, indicates that an increase in the level of this variable lowers the probability of observing an outcome at which $Y_{i}=1$.
In the probit model, there are $k+2$ unknown parameters ($\beta _o$, $\beta _1$, $\ldots $, $\beta _k$, $\sigma ^2$). It is not possible, however, to generate unique estimates of each of these parameters. An inspection of equation \ref{Z.i.lc} suggests the reason for this. This equation may be restated as:
\begin{equation}
Z_i^{*}=\frac{\beta _o}\sigma +\frac{\beta _1}\sigma X_{1i}+\frac{\beta _2}% \sigma X_{2i}+\cdots +\frac{\beta _k}\sigma X_{ki} \label{Z.i.2.lc} \end{equation}
Under the probit estimation procedure, it is possible to estimate only the $% k+1$ ratios: $\beta _o/\sigma ,\beta _1/\sigma ,\ldots ,\beta _k/\sigma $.
Fortunately, however, only these ratios are needed to construct estimates of $Z_i^{*}$. For this reason, econometricians generally simplify the notation by setting the value of $\sigma $ equal to one. Using this normalization, equation \ref{Z.i.2.lc} may be restated as:
\begin{equation}
Z_i^{*}=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki} \label{Z.i.3.lc}
\end{equation}
Of course, the resulting parameter estimates are really estimates of $\beta _j/\sigma $. This normalization is adopted in the discussion below.
The estimation of the coefficients $\beta _o,\beta _1,\ldots ,\beta _k$ requires the use of a maximum likelihood estimation procedure.
Unfortunately, a full discussion of maximum likelihood estimation requires the use of mathematical tools beyond the scope of this text. (The mathematical appendix at the end of this chapter provides a brief overview of maximum likelihood estimation techniques.) There are a number of desirable properties that are satisfied by maximum likelihood estimators. In particular, maximum likelihood estimators:% \footnote{%
It should be noted, of course, that these properties hold only if the model is correctly specified.}
\begin{itemize}
\item are consistent;
\item are efficient; and
\item provide asymptotically correct $t$-ratios that may be used for hypothesis testing.
\end{itemize}
Notice that the probit model is not subject to the problems that are associated with the linear probability model. In particular, in the probit model the error terms are normally distributed and homoskedastic by assumption. As noted above, the $t$-ratios generated by the estimation procedure may be used for testing hypotheses concerning individual coefficients. Since the value of $\Phi (Z_i^{*})$ is bounded between zero and one, predicted probabilities under the probit model will never be less than zero or exceed one.
It is important to note, however, that the interpretation of the slope coefficients under the probit model is quite different than under the multiple regression model. The slope coefficient, $\beta _j$, is a measure of the change in the value of $Z_i^{*}$ that is associated with a change in $% X_{ij}$. In mathematical terms:
\begin{equation*}
\beta _j=\frac{\Delta Z_i^{*}}{\Delta X_{ij}} \end{equation*}
This change in $Z_i^{*}$, however, is not a measure of the change in the probability of observing $Y_i=1$. The actual change in the probability of observing $Y_i=1$, however, depends upon the initial level of $Z_i^{*}$.
A simple example can be used to illustrate this effect. Suppose that a simple probit model is given by:
\begin{equation*}
Z_{i}^{\ast }=-1.0+0.5X_{i}
\end{equation*}%
In this case, a one-unit change in $X_{i}$ results in a change in $% Z_{i}^{\ast }$ equal to 0.5. If the level of $X_{i}$ changes from 2 to 3, the value of $Z_{i}^{\ast }$ changes from 0 to 0.5. This causes the probability of observing $Y_{i}=1$ to increase from 0.5 to 0.6915.\footnote{% To see this, note that $\Phi (0)=0.5$ and $\Phi (0.5)=0.6915$ (as indicated by the standard normal CDF contained in Appendix \ref{stat.tab.app} at the end of this text).} In Figure~\ref{cind_g_lim}, this change in probability is indicated by the larger shaded region. If $X_{i}$ increases from 6 to 7, however, the probability of observing $Y_{i}=1$ changes from 0.9772 to 0.9938 (since the value of $Z_{i}$ changes from 2.0 to 2.5 in this case).
Thus, in the first case, the probability of observing $Y_{i}=1$ increases by 0.1915 while it increases by only 0.0166 in the second case. The smaller shaded region in Figure~\ref{cind_g_lim} illustrates the effect of a change in the level of $X_{i}$ from 6 to 7. As this simple example illustrates, the effect of a one-unit change in the level of an independent variable varies with the level of the independent variable. For this reason, when econometricians report the effect of a change in the level of one of the $% X_{i}$’s on the probability of observing $Y_{i}=1$, they generally evaluate this at the sample means for all of the variables.\footnote{% In the probit model, the marginal effect associated with a one-unit change in the level of an independent variable can be measured as: \begin{equation*}
\frac{\Delta \text{ in Prob(}Y_{i}=1\text{)}}{\Delta X_{j}}=\beta _{j}\phi \left( Z^{\ast }\right)
\end{equation*}%
where $\phi (Z^{\ast })$ is the probability density function for a standard normal variate evaluated at the mean values for all of the $X_{j}$.
\par
In the case of quantitative independent variables, this effect is also often reported in the form of an elasticity measured as: \begin{equation*}
\frac{\%\Delta \text{ in Prob(}Y_{i}=1)}{\%\Delta \text{ in }X_{ji}} \end{equation*}%
In the case of dummy independent variables, however, it is unreasonable to compute an elasticity of this form (since the dummy variable can only take on the two values of zero or one). Instead, the marginal effect of a qualitative variable is measured by computing the change in the probability when the dummy variable changes from zero to one, holding the level of all other variables constant at their respective sample means.} This approach makes it possible to evaluate the effect of a change in a particular independent variable on a \textquotedblleft typical\textquotedblright\ observation.
\begin{center}
\FRAME{ftbpFU}{5.0963in}{1.5783in}{0pt}{\Qcb{Effect of a change in an independent variable under the probit model}}{\Qlb{cind_g_lim}}{fig14-5.gif}{% \special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.0963in;height 1.5783in;depth 0pt;original-width 5.0419in;original-height 1.542in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename
‘GRAPHS/Fig14-5.gif’;file-properties “XNPEU”;}} \end{center}
\subsubsection{Example: College attendance decision} For comparison purposes, it will be convenient to use the model of educational choice discussed above. Under the probit specification, the model is:
\begin{equation*}
\text{Attend college if: }Z_{i}\geq 0,\text{ where:} \end{equation*}%
\begin{equation}
Z_{i}=\beta _{o}+\beta _{1}\text{HSRANK}_{i}+\beta _{2}\text{HSLEAD}% _{i}+\beta _{3}\text{SAT}_{i} \label{lpm2.lc} \end{equation}%
\begin{equation*}
+\beta _{4}\text{FEMALE}_{i}+\beta _{5}\text{MLHS}_{i}+\beta _{6}\text{MCOL}% _{i}
\end{equation*}%
\begin{equation*}
+\beta _{7}\text{FLHS}_{i}+\beta _{8}\text{FCOL}_{i}+\beta _{9}\text{NSIB}% _{i}-u_{i}
\end{equation*}%
When the parameters of equation \ref{lpm1.lc} are estimated using a maximum likelihood procedure, the following results are obtained: \begin{equation}
\hat{Z}_{i}^{\ast }=-\underset{(-16.23)}{1.7290}+\underset{(10.26\ast \ast )}% {0.00975}\text{HSRANK}_{i}+\underset{(3.09\ast \ast )}{0.1360}\text{HSLEAD}% _{i} \label{lpm3.lc}
\end{equation}%
\begin{equation*}
+\underset{(12.71\ast \ast )}{0.00164}\text{SAT}_{i}-\underset{(-0.007)}{% 0.0003}\text{FEMALE}_{i}-\underset{(-2.01\ast )}{0.108}\text{MLHS}_{i} \end{equation*}%
\begin{equation*}
+\underset{(4.64\ast \ast )}{0.238}\text{MCOL}_{i}-\underset{(-1.38)}{0.073}% \text{FLHS}_{i}+\underset{(5.59\ast \ast )}{0.289}\text{FCOL}_{i}-\underset{% (-2.59\ast \ast )}{0.0238}\text{NSIB}_{i}
\end{equation*}%
\begin{equation*}
(t\text{-statistics in parentheses)}
\end{equation*}%
\begin{equation*}
^{\ast }\text{significant at the .05 level}
\end{equation*}%
\begin{equation*}
^{\ast \ast }\text{significant at the .01 level} \end{equation*}%
It is interesting to note that under the probit model all of the variables have the same sign as occurred under the linear probability model. This is a fairly common result. While, as noted above, there are serious problems with the linear probability model, the results of this model are often quite similar to the results of the probit model.
Let’s examine how this estimated equation can be used to predict the probability of college attendance. Consider a male high school graduate with the following characteristics:\label{probit.ed.start} \begin{itemize}
\item 80th percentile in his high school class; \item a student leader in one or more high school clubs or activities; \item combined SAT scores equal to 1200;
\item both parents have college degrees; and
\item only one sibling is present.
\end{itemize}
With this information, it is possible to use the estimates in equation \ref% {lpm3.lc} to construct a predicted value of $Z_{i}^{\ast }$: \begin{equation}
\hat{Z}_{i}^{\ast }=-1.7290+0.00975(80)+0.1360(1)+0.00164(1200) \label{lpm4.lc}
\end{equation}%
\begin{equation*}
-0.0003(0)-0.108(0)+0.238(1)
\end{equation*}%
\begin{equation*}
-0.073(0)+0.289(1)-0.0238(1)
\end{equation*}%
\begin{equation*}
=1.6582
\end{equation*}%
Using a table containing the cumulative density function for a standard normal distribution (or an econometrics software package), it can be determined that the predicted probability of this individual attending college equals:
\begin{equation*}
\text{Prob}(\text{attending college})=\Phi (\hat{Z}_{i}^{\ast }) \end{equation*}%
\begin{equation*}
=\Phi (1.6582)
\end{equation*}%
\begin{equation*}
=95.14\%
\end{equation*}
We can use the results in equation \ref{lpm3.lc} to investigate the predicted effects of changes in the levels of each of these independent variables. As compared to the case reported above:\footnote{% Each of these effects captures only the effect of this particular change, holding all other variables at the level described in the example provided above.}
\begin{itemize}
\item a 10 percentile point increase in class rank changes the value of $% \hat{Z}_{i}^{\ast }$ to 1.7557 and causes the probability of attending college to rise to 96.04\%,
\item a 10 percentile point decrease in class rank changes the value of $% \hat{Z}_{i}^{\ast }$ to 1.5607 and causes the probability of attending college to decline to 94.07\%,
\item not being a leader in any high school clubs or activities changes the value of $\hat{Z}_{i}^{\ast }$ to 1.5222 and causes the probability of attending college to decline to 93.60\%,
\item a 100 point increase in SAT scores changes the value of $\hat{Z}% _{i}^{\ast }$ to 1.8222 and raises the probability of attending college to 96.58\%,
\item a 100 point decrease in SAT scores changes the value of $\hat{Z}% _{i}^{\ast }$ to 1.4942 and lowers the probability of attending college to 93.24\%,
\item being female changes the value of $\hat{Z}_{i}^{\ast }$ to 1.6579 and lowers the probability of attending college to 95.13\%, \item if the mother has a high school degree, the value of $\hat{Z}% _{i}^{\ast }$ changes to 1.4202 and the probability of college attendance falls to 92.22\%
\item if the father has a high school degree, the value of $\hat{Z}% _{i}^{\ast }$ changes to 1.3692 and the probability of college attendance falls to 91.45\%,
\item if the mother did not complete high school, the value of $\hat{Z}% _{i}^{\ast }$ changes to 1.3122 and the probability of college attendance falls to 90.53\%,\footnote{%
Note that in this case, it is necessary to set MLHS=1 and MCOL=0 to measure the marginal impact of not completing high school since it is not possible to have a college degree if the highest level of education is less than a high school degree. A similar change must be made to measure the effect of the father not completing high school in the next case.} \item if the father did not complete high school, the value of $\hat{Z}% _{i}^{\ast }$ changes to 1.2962 and the probability of college attendance falls to 90.25\%,
\item having no siblings changes the value of $\hat{Z}_{i}^{\ast }$ to 1.682 and raises the probability of college attendance to 95.37\%, and \item having 2 siblings changes the value of $\hat{Z}_{i}^{\ast }$ to 1.6344 and lowers the probability of college attendance to 94.89\%.
\end{itemize}
It should be noted that, due to the nonlinearity of the normal CDF: \begin{enumerate}
\item the effect of a one-unit increase in a variable will differ from the effect of a one-unit decrease (in the example above, a 100 point increase in SAT scores results in a 1.44 percentage point increase in predicted probability while a 100 point decrease in SAT scores leads to a 1.9 percentage point decrease in predicted probability), and \item the marginal impact of any variable depends upon the levels of all of the variables.
\end{enumerate}
\label{probit.ed.end}
\subsubsection{Logit analysis}
The logit model is very similar to the probit model discussed above. Once again, it is assumed that a qualitative variable is present ($Y_i=1)$ only if the net benefit associated with this choice is positive. As in the probit model, the net benefit function in the logit model is assumed to be a linear function of the variables $X_1,X_2,\ldots ,X_k$: \begin{equation}
Z_i=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}-u_i \label{z.logit.def}
\end{equation}
This specification differs from the probit specification only in that the error terms, $u_i$, are assumed to follow a logistic distribution rather than a normal distribution. Under this logistic distribution, the probability of observing $Y_i=1$ is defined as: \begin{equation}
\text{Prob(}Y_i=1)=\frac{e^{Z_i}}{1+e^{Z_i}} \label{logit.cdf.lc} \end{equation}
and the probability of observing $Y_i=0$ equals: \begin{equation}
\text{Prob(}Y_i=0)=\frac 1{1+e^{Z_i}} \label{logit1.cdf.lc} \end{equation}
Notice that the level of $e^{Z_i}$ varies between 0 and $\infty $ as $Z_i$ varies from -$\infty $ to +$\infty $. In consequence, the probabilities given by equations \ref{logit.cdf.lc} and \ref{logit1.cdf.lc} will always be between zero and one for all possible values of the independent variables.
Discussions of the logit model often refer to the \textbf{odds ratio}, defined as:
\begin{equation*}
\text{odds ratio = }\frac{\text{Prob(}Y_i=1)}{\text{Prob(}Y_i=0)} \end{equation*}
\begin{equation*}
=\frac{\frac{e^{Z_i}}{1+e^{Z_i}}}{\frac 1{1+e^{Z_i}}} \end{equation*}
\begin{equation*}
=e^{Z_i}
\end{equation*}
or, using equation \ref{z.logit.def}:
\begin{equation}
\text{odds ratio = }e^{\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}-u_i} \label{log.odds.lc}
\end{equation}
In a logit model of the decision to attend college, for example, the odds ratio serves as a measure of the likelihood of a decision to attend college (as a function of the individual characteristics measured by the variables $% X_1,$ $X_2$, $\ldots ,$ $X_j$). An odds ratio equal to 2 indicates that there are two to one odds that the individual will choose college attendance. (The concept of an odds ratio should be familiar to anyone who has ever placed bets on horse races or discussed the possibility of accepting wagers at, for example, 2-1 odds.) A log-odds ratio is computed by taking the natural log of the odds ratio appearing in equation \ref{log.odds.lc}:
\begin{equation*}
\ln (\text{odds ratio})=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}-u_i
\end{equation*}
Thus, under the logit model, the log-odds ratio is assumed to be a linear function of the observable variables. Each of the slope coefficients, $\beta _j$, is a measure of the change in the log-odds ratio resulting from a one-unit change in the level of $X_j$.
\begin{center}
\FRAME{ftbpFU}{5.6412in}{1.7461in}{0pt}{\Qcb{Comparison of logistic and standard normal probability density functions}}{\Qlb{log_n01_g_lim}}{% fig14-6.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.6412in;height 1.7461in;depth 0pt;original-width 5.5832in;original-height 1.708in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-6.gif’;file-properties “XNPEU”;}} \end{center}
The probability density function for a logistic distribution has a shape that is very similar to a normal distribution. As Figure~\ref{log_n01_g_lim} illustrates, the logistic distribution has somewhat thicker tails than the normal distribution. Since the two distributions are so similar, the choice between a logit and a probit specification has often been made on the basis of mathematical convenience.\footnote{%
The logit function is somewhat simpler to estimate since the cumulative density function for the logistic distribution has a relatively simple functional form. The normal CDF, however, cannot be expressed in terms of a simple function and must be computed using a numerical approximation procedure. Since the numerical approximation methods that were initially developed to compute the normal CDF were relatively slow (particularly during the early years of computing), the logit model was often favored by econometricians in early studies on the grounds of relative computational simplicity.} The development of high-speed computers and improved software algorithms have, for all practical purposes eliminated any distinctions between these two models on this ground. Some econometricians prefer the probit model on the basis of the central limit theorems that suggest that many distributions converge to a normal distribution as the size of the sample increases.
In practice, however, both logit and probit models tend to provide very similar results in most applications. When the same model is estimated using both probit and logit analysis, the predicted probabilities and the estimated $t$-ratios tend to be very similar. As a result of differences in the functional forms, however, the estimated coefficients in the logit model tend to be approximately 1.6 times larger than the corresponding coefficients in the probit model.\footnote{% This scale factor was originally suggested by Amemiya (1981, p. 1487) on the basis of a trial and error procedure. Greene (2000, p. 817) shows that this relationship must hold for the marginal effects to be the same under each model. As Greene and Amemiya both note, this approximation works best if the average value of $Z_{i}^{\ast }$ is close to zero (at the center of each distribution).}
\subsection{Comparison of logit and probit estimators: educational choice} The educational choice model discussed above provides a convenient basis of comparison between the logit and probit estimators. Table \ref{log.prob.lc} provides a listing of the estimated parameters and the corresponding $t$% -ratios under the probit and logit models. As expected, the coefficients are all somewhat larger under the logit model. The $t$-ratios, however, are quite similar. Since the $t$-ratios generally tend to be very similar in both the probit and logit models, hypothesis tests generally have the same outcome under either model specification.%
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering} }% %BeginExpansion
\begin{table}[tbp] \centering
%EndExpansion
\begin{tabular}{|c|c|c|c|c|}
\hline
& \textbf{Probit} & & \textbf{Logit} & \\
\textbf{Variable} & \textbf{estimates} & \textbf{t-ratio} & \textbf{estimates% } & \textbf{t-ratio} \\ \hline
\multicolumn{1}{|l|}{constant} & \multicolumn{1}{|r|}{-1.7290} & \multicolumn{1}{|r|}{-16.23**} & \multicolumn{1}{|r|}{-2.9410} & \multicolumn{1}{|r|}{-16.14**} \\
\multicolumn{1}{|l|}{HSRANK} & \multicolumn{1}{|r|}{0.00975} & \multicolumn{1}{|r|}{10.26**} & \multicolumn{1}{|r|}{0.0162} & \multicolumn{1}{|r|}{10.20**} \\
\multicolumn{1}{|l|}{HSLEAD} & \multicolumn{1}{|r|}{0.1360} & \multicolumn{1}{|r|}{3.09**} & \multicolumn{1}{|r|}{0.237} & \multicolumn{1}{|r|}{3.18**} \\
\multicolumn{1}{|l|}{SAT} & \multicolumn{1}{|r|}{0.00164} & \multicolumn{1}{|r|}{12.71**} & \multicolumn{1}{|r|}{0.00279} & \multicolumn{1}{|r|}{12.80**} \\
\multicolumn{1}{|l|}{FEMALE} & \multicolumn{1}{|r|}{-0.0003} & \multicolumn{1}{|r|}{-0.007} & \multicolumn{1}{|r|}{-0.0059} & \multicolumn{1}{|r|}{-0.08} \\
\multicolumn{1}{|l|}{MLHS} & \multicolumn{1}{|r|}{-0.108} & \multicolumn{1}{|r|}{-2.01*} & \multicolumn{1}{|r|}{-0.174} & \multicolumn{1}{|r|}{-1.96} \\
\multicolumn{1}{|l|}{MCOL} & \multicolumn{1}{|r|}{0.238} & \multicolumn{1}{|r|}{4.64**} & \multicolumn{1}{|r|}{0.418} & \multicolumn{1}{|r|}{4.80**} \\
\multicolumn{1}{|l|}{FLHS} & \multicolumn{1}{|r|}{-0.073} & \multicolumn{1}{|r|}{-1.38} & \multicolumn{1}{|r|}{-0.115} & \multicolumn{1}{|r|}{-1.30} \\
\multicolumn{1}{|l|}{FCOL} & \multicolumn{1}{|r|}{0.289} & \multicolumn{1}{|r|}{5.59**} & \multicolumn{1}{|r|}{0.492} & \multicolumn{1}{|r|}{5.65**} \\
\multicolumn{1}{|l|}{NSIB} & \multicolumn{1}{|r|}{-0.0238} & \multicolumn{1}{|r|}{-2.58**} & \multicolumn{1}{|r|}{-0.0386} & \multicolumn{1}{|r|}{-2.51*} \\ \hline
\multicolumn{5}{l|}{{*}significant at the .05 level} \\ \multicolumn{5}{l|}{{*}*significant at the .01 level}% \end{tabular}
\caption{Comparison of logit and probit estimates — educational attainment model\label{log.prob.lc}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
Under the probit model reported above, the predicted probability of college attendance was equal to 95.14\% for a male high school graduate who: \begin{itemize}
\item graduated at the 80th percentile in his class, \item was a leader in a student organization, \item had combined SAT scores of 1200,
\item had two parents that completed at least 1 year of college, and \item has one sibling.\footnote{%
This example was discussed on page \pageref{probit.ed.start}.} \end{itemize}
To compute the corresponding probability estimate using the logit results in Table \ref{log.prob.lc}, it is first necessary to compute the value of $\hat{% Z}_{i}$ for this hypothetical individual:
\begin{equation}
\hat{Z}_{i}=-2.9410+0.0162(80)+0.237(1)+0.00279(1200) \label{logit.1a.lc} \end{equation}%
\begin{equation*}
-0.0059(0)-0.174(0)+0.418(1)
\end{equation*}%
\begin{equation*}
-0.115(0)+0.492(1)-0.0386(1)
\end{equation*}%
\begin{equation*}
=2.8114
\end{equation*}%
Using equation \ref{logit.cdf.lc}, the predicted probability that this individual will attend college equals:
\begin{equation}
\text{Prob(}Y_{i}=1)=\frac{e^{Z_{i}}}{1+e^{Z_{i}}} \label{logit.cdf1.lc} \end{equation}%
\begin{equation*}
=\frac{e^{2.8114}}{1+e^{2.8114}}
\end{equation*}%
\begin{equation*}
=\frac{16.6332}{1+16.6332}
\end{equation*}%
\begin{equation*}
=94.33\%
\end{equation*}%
This predicted probability is quite close to that predicted under the probit model.
The effects of changes in the independent variables was discussed above for the probit version of this model. Let’s see how these results change under the logit specification:
\begin{itemize}
\item a 10 percentile point increase in class rank changes the value of $% \hat{Z}_{i}$ to 2.9734 and causes the probability of attending college to rise to 95.14\%,
\item a 10 percentile point decrease in class rank changes the value of $% \hat{Z}_{i}$ to 2.6494and causes the probability of attending college to decline to 93.39\%,
\item not being a leader in any high school clubs or activities changes the value of $\hat{Z}_{i}$ to 2.5744 and causes the probability of attending college to decline to 92.91\%,
\item a 100 point increase in SAT scores changes the value of $\hat{Z}_{i}$ to 3.0904 and raises the probability of attending college to 95.65\%, \item a 100 point decrease in SAT scores changes the value of $\hat{Z}_{i}$ to 2.5324 and lowers the probability of attending college to 92.64\%, \item being female changes the value of $\hat{Z}_{i}$ to 2.8055 and lowers the probability of attending college to 94.30\%, \item if the mother has a high school degree, the value of $\hat{Z}_{i}$ changes to 2.3934 and the probability of college attendance falls to 91.63\% \item if the father has a high school degree, the value of $\hat{Z}_{i}$ changes to 2.3194 and the probability of college attendance falls to 91.05\%, \item if the mother did not complete high school, the value of $\hat{Z}_{i}$ changes to 2.2194 and the probability of college attendance falls to 92.02\%,% \footnote{%
Note that in this case, it is necessary to set MLHS=1 and MCOL=0 to measure the marginal impact of not completing high school since it is not possible to have a college degree if the highest level of education is less than a high school degree. A similar change must be made to measure the effect of the father not completing high school in the next case.} \item if the father did not complete high school, the value of $\hat{Z}_{i}$ changes to 2.2044 and the probability of college attendance falls to 90.06\%, \item having no siblings changes the value of $\hat{Z}_{i}$ to 2.85 and raises the probability of college attendance to 94.53\%, and \item having 2 siblings changes the value of $\hat{Z}_{i}$ to 2.7728 and lowers the probability of college attendance to 94.12\%.
\end{itemize}
Once again, it should be noted that, as in the case of a probit model: \begin{enumerate}
\item the effect of a one-unit increase in a variable will differ from the effect of a one-unit decrease, and
\item the marginal impact of any variable depends upon the levels of all of the variables.
\end{enumerate}
\section{Measures of goodness of fit}
In regression models, R$^{2}$ serves as a commonly reported measure of the \textquotedblleft goodness of fit\textquotedblright\ for a regression relationship. Unfortunately, this measure does not provide a very meaningful measure when the dependent variable is a binary variable. The basic problem is that the observed variable takes on only the two values of zero and one.
The linear probability, probit, and logit models predict the probabilities of each of these outcomes. Since we observe only the realized values of zero or one (and not the actual probability of these outcomes), the standard measure of R$^{2}$ cannot be interpreted in the usual manner.
As initially discussed in Chapters \ref{biv.reg.chap} and \ref{mult.chap}, R$% ^{2}$ is defined as:
\begin{equation*}
\text{R}^{2}=\frac{\text{RSS}}{\text{TSS}}=\frac{\sum \left( \hat{Y}_{i}-% \overline{Y}\right) ^{2}}{\sum \left( Y_{i}-\overline{Y}\right) ^{2}} \end{equation*}%
In a multiple regression model with a continuous dependent variable, the value of R$^{2}$ lies between zero and one. When the dependent variable is a dummy variable, however, the value of R$^{2}$ will always be less than one.
Figure~\ref{r2lpm_g_lim} illustrates this problem in the case of the linear probability model. Even if the model provides a correct description of the process generating the outcomes, the predicted probabilities are never equal to the observed values of zero and one.
\begin{center}
\FRAME{ftbpFU}{5.4639in}{3.0018in}{0pt}{\Qcb{R$^{2}$ and linear probability model}}{\Qlb{r2lpm_g_lim}}{fig14-7.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.4639in;height 3.0018in;depth 0pt;original-width 5.4059in;original-height 2.9585in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-7.gif’;file-properties “XNPEU”;}} \end{center}
Because of this problem, econometricians generally do not report an R$^{2}$ for linear probability, probit, or logit models.\footnote{% A number of alternative measures for R$^{2}$ have been proposed. A full discussion of these measures may be found in a more advanced text. The interested reader may find a good discussion in Maddala (1983, pp. 37-41), or Amemiya (1981, pp. 1502-1507).} In qualitative dependent variable models of this sort, one commonly used measure of goodness of fit is simply the proportion of total outcomes \textquotedblleft predicted\textquotedblright\ correctly. In this case, the predicted value for each observation is set equal to the outcome that has the highest predicted probability: \begin{equation*}
\hat{Y}_{i}=1\text{ if }\widehat{\text{Prob}}(Y_{i}=1)>0.5 \end{equation*}%
\begin{equation*}
\hat{Y}_{i}=0\text{ if }\widehat{\text{Prob}}(Y_{i}=0)<0.5 \end{equation*}%
While this measure gives an indication of the reliability of the predictions associated with the estimated model, it is not clear that this is necessarily a good measure of the \textquotedblleft quality\textquotedblright\ of the model. To see this consider the following two cases:
\begin{itemize}
\item Case I: The dependent variable equals one 90\% of the time. An econometrician sets the predicted value equal to one in each case and achieves a 90\% accuracy in predicting the value of the dependent variable.
\item Case II: An economist estimates a probit model of the decision to attend college that accurately captures the decision process for individuals$% .$ Many individuals in the sample, however, have (accurately) predicted probabilities that are close to 50\%. Using the criteria of a “correct”
prediction, however, many of these individuals will have outcomes that differ from the predicted value. This model, while accurately reflecting the choice process, may result in, for example, only 80\% “correct”
predictions.
\end{itemize}
As the first case suggests, a high probability of success in predictions does not necessarily mean that the model selected is a good model. A high proportion of successful predictions will always occur when one of the outcomes has a relatively high probability of occurring. The second case, however, indicates a more fundamental flaw with using predictive accuracy as a measure of goodness of fit. Since models of this sort are designed to estimate the probability of alternative outcomes occurring, it is somewhat inappropriate to rely on “correct” predictions as a measure of the quality of the model. To see this, suppose that there is a 51\% predicted probability of observing the outcome given by $Y_i=1$. In this case, is an observed outcome of $Y_i=0$ an incorrect forecast when the model predicts that this outcome will occur 49\% of the time?
Because of these problems, it is inappropriate to use either R$^{2}$ or the proportion of outcomes \textquotedblleft correctly\textquotedblright\ explained as a measure of the quality of a model when the observed dependent variable is a dummy variable. These measures, however, may be used to compare alternative models that explain the same binary dependent variable.
An alternative measure of \textquotedblleft goodness of fit\textquotedblright\ is provided by the likelihood ratio test discussed below.
\section{Tests of linear restrictions}
As discussed in previous chapters, econometricians often wish to test hypotheses involving two or more parameters. In multiple regression models, tests of this sort are usually conducted using either a Wald or Lagrange multiplier test (as discussed in Chapter \ref{spec.chap}). While Wald and Lagrangian multiplier tests can be constructed for probit and logit models, they are a bit more complicated under these models. Fortunately an asymptotically equivalent test is available in the form of a test known as a \textbf{likelihood ratio test}.\footnote{%
This likelihood ratio test described below may also be used to test linear restrictions in multiple regression models as well. In practice, however, the Wald and Lagrange multiplier tests are more commonly used in multiple regression models.} Let’s examine this test.
As in the case of the multiple regression model, tests of linear restrictions under the logit and probit specifications involve the formulation and estimation of two versions of the model: \begin{itemize}
\item an unrestricted version in which no restrictions are placed on parameter values; and
\item a restricted version that embodies the restriction imposed by the null hypothesis.
\end{itemize}
To conduct a likelihood ratio test, it is necessary to construct a likelihood ratio statistic, defined as:
\begin{equation*}
\text{likelihood ratio statistic = }-2\left( \ln (L_{R})-\ln (L_{U})\right) \end{equation*}%
\begin{equation*}
\begin{array}{ll}
\text{where:} & \ln (L_{R})\text{ = log of the likelihood function for the restricted model} \\
& \ln (L_{U})\text{ = log of the likelihood function for the unrestricted model}%
\end{array}%
\end{equation*}%
Since all major econometric software packages report the log of the likelihood function at the final parameter estimates, this statistic can be readily computed.\footnote{%
The likelihood ratio test relies on the use of the likelihood function. This function provides a measure of the joint probability of observing the realized sample outcomes as a function of the unknown model parameters.
Since the likelihood function is a joint probability density function, it will always take on a value between zero and one.
\par
Note that the values of the likelihood function for the restricted and unrestricted models should be essentially the same if the restrictions embodied in the null hypothesis are correct. In this case, the likelihood ratio statistic will be close to zero and the null hypothesis will not be rejected. If the null hypothesis is false, however, the unrestricted model will provide a better explanation of the observed outcomes and the likelihood function for the unrestricted model will be larger in magnitude, leading to a larger value of the likelihood ratio statistic.
\par
A brief discussion of the likelihood function, appears in the mathematical appendix at the end of this chapter. A more complete discussion of the likelihood ratio test may be found in Engle (1984) or Amemiya (1985), pp.
141-145.} If the restrictions imposed by the null hypothesis are satisfied, then the likelihood ratio statistic is distributed as a $\chi ^{2}$ statistic with $m$ degrees of freedom (where $m$ equals the number of restrictions imposed by the null hypothesis).
\subsection{The likelihood ratio test as a measure of “goodness of fit”} In Chapter \ref{hyp.mult.chap}, it was noted that a Wald test could be used to test for the joint significance of all of the slope coefficients in the model. In logit and probit models, an equivalent test can be performed using the likelihood ratio test described above. The null hypothesis in this case is given by:
\begin{equation*}
\text{H}_o\text{: }\beta _1=\beta _2=\cdots =\beta _k=0 \end{equation*}
The alternative hypothesis is:
\begin{equation*}
\text{H}_1\text{: At least one of the slope coefficients (}\beta _1,\ldots ,\beta _k\text{) is not equal to zero}
\end{equation*}
To conduct a test of this hypothesis, the likelihood ratio statistic is formulated as:
\begin{equation}
\text{likelihood ratio statistic = }-2\left( \ln (L_R)-\ln (L_U)\right) \label{lik.rat2.lc}
\end{equation}
\begin{equation*}
\begin{array}{ll}
\text{where:} & \ln (L_R)\text{ = log of the likelihood function for the model} \\
& \text{ in which }\beta _1=\beta _2=\cdots =\beta _k=0 \\ & \\
& \ln (L_U)\text{ = log of the likelihood function for the unrestricted model% }%
\end{array}%
\end{equation*}
If the estimated likelihood statistic exceeds the critical value for a $\chi ^2$ statistic on $k$ degrees of freedom, then the null hypothesis is rejected.
Most econometric models automatically provide the likelihood ratio statistic for logit and probit models. If not, however, it can be easily computed by estimating the model twice. In the unrestricted version of the model, all slope and intercept terms parameters are estimated. In the restricted version of the model, only an intercept term is included (since all slope parameters are assumed to equal zero). The estimated values of the log-likelihood function for each of these model specifications can then be substituted into equation \ref{lik.rat2.lc} to compute the likelihood ratio statistic.
As noted above, it has become standard practice to report the likelihood ratio test whenever the results from probit or logit models are presented.
\subsection{Tests of other restrictions on parameter values} As in the case of the Wald test, the likelihood ratio test may also be used to test for a wide variety of other types of restrictions involving model parameters. Examples of these restrictions include: \begin{itemize}
\item tests for the joint significance of a subset of variables. In this case, the null hypothesis would take the general form: \begin{equation*}
\text{H}_{o}\text{: }\beta _{i}=\beta _{i+1}=\cdots =\beta _{i+m}=0\text{ } \end{equation*}
\item tests of the equality of two or more coefficients. The null hypothesis in this case may be stated in the form:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{i}=\beta _{i+1}=\cdots =\beta _{i+m} \end{equation*}
\item tests involving linear combinations of the coefficients. Possible examples of null hypotheses of this type include: \begin{equation*}
\text{H}_{o}\text{: }\beta _{1}+\beta _{2}=1 \end{equation*}
or,
\begin{equation*}
\text{H}_{o}\text{: }\beta _{1}-\beta _{2}=0 \end{equation*}
or
\begin{equation*}
\text{H}_{o}\text{: }2\beta _{1}+5\beta _{2}+2\beta _{3}=12 \end{equation*}
\end{itemize}
In each of these cases, the construction of the restricted and unrestricted versions of the models is equivalent to the procedure discussed extensively in Section \ref{sets.of.rest.6}. Once the restricted and unrestricted modes are formulated, the null hypothesis is tested using the likelihood ratio statistic defined above.
\subsection{Example: College attendance}
Let’s reconsider the probit version of the educational attainment model. The probit equation is:
\begin{equation}
Z_{i}^{\ast }=\beta _{o}+\beta _{1}\text{HSRANK}_{i}+\beta _{2}\text{HSLEAD}% _{i}+\beta _{3}\text{SAT}_{i} \label{nlpm.lc} \end{equation}%
\begin{equation*}
+\beta _{4}\text{FEMALE}_{i}+\beta _{5}\text{MLHS}_{i}+\beta _{6}\text{MCOL}% _{i}
\end{equation*}%
\begin{equation*}
+\beta _{7}\text{FLHS}_{i}+\beta _{8}\text{FCOL}_{i}+\beta _{9}\text{NSIB}% _{i}-u_{i}
\end{equation*}%
Suppose that we wished to test for\thinspace the joint significance of all of the slope coefficients in this model. In this case, the null hypothesis is:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{1}=\beta _{2}=\cdots =\beta _{9}=0 \end{equation*}%
The restricted version of the model, corresponding to this null hypothesis is:
\begin{equation*}
Z_{i}^{\ast }=\beta _{o}-u_{i}
\end{equation*}%
The unrestricted version of this model is given by equation \ref{nlpm.lc}.
When the parameters of these models are estimated using a maximum likelihood technique, the log-likelihood function equals -2596.4 for the unrestricted model and -3196.1 for the restricted model.\footnote{% The value of the log-likelihood function is computed and reported by virtually all econometric packages that perform maximum likelihood estimation.} Thus, the likelihood ratio statistic is defined as: \begin{equation*}
\text{likelihood ratio statistic = -2}\left[ -3196.1-\left( -2596.4\right) % \right] \text{= 1199.4}
\end{equation*}
At a 1\% significance level, the critical value for a $\chi ^2$ distribution with 9 degrees of freedom is 21.67. Thus, the null hypothesis is rejected.
\section{Polychotomous choices\label{poly.lc}} Quite often, however, econometricians are interested in examining choice processes in which individuals select among more than two outcomes. For example, the choice of the optimal level of education is not simply a decision of whether or not to attend college. There are, in fact, many different levels of educational attainment that may be selected. Individuals also face choices among many different alternative occupations. While the choices still involve qualitative outcomes, the choice is often among three or more alternatives. Models such as these are referred to as \textbf{% polychotomous choice models}.\footnote{%
A full discussion of these models is beyond the scope of the current text.
The interested reader may find a good discussion of these (and other) limited dependent variable models in Greene (2000) and Maddala (1983).} Let’s briefly examine two extensions of the probit and logit model that make it possible to examine such choice processes.
\subsection{Ordered probit models}
The \textbf{ordered probit model} is often used by economists who are analyzing the choice among a set of alternatives that can be ranked from lowest to highest according to a commonly agreed upon scale.\footnote{% An ordered logit model may also be used to represent these choice processes.
The ordered logit model differs from the ordered probit model by assuming that the underlying distribution of the error terms follows a logistic rather than a normal probability density function. In practice, however, the ordered probit model is more commonly used by applied econometricians.} Levels of educational attainment, for example can be ranked from elementary school, 1-3 years of high school, high school graduate, 1-3 years of college, to advanced degrees. All branches of the military use a hierarchical system in which employees are sorted into various categories (such as private, lieutenant, general, etc.). College faculty are similarly sorted into a range of categories from lecturers to full professors. Credit card companies evaluating the credit worthiness of potential customers sort them into several categories: credit rejection, limited credit (and a high interest rate), and more extensive credit (and a lower interest rate).
In situations such as these, the choice is not between two outcomes, but among a set of potential outcomes that can be ranked from highest to lowest.
As noted above, the probit model suggests that individuals faced with a binary choice will choose the option that offers the highest level of net benefit. The ordered probit model is a generalization of this model in which there are a series of thresholds that must be passed for individuals to move up to a higher level in some hierarchical ordering. An example should help to illustrate the use of this model.
Suppose that an ordered probit model is used to explain an individual’s level of educational attainment.\footnote{%
A model of this general form was used by Spizman and Kane (1992) in an attempt to use family background characteristics to forecast an individual’s educational atttainment.} The basic model is: \begin{equation}
Z_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+\cdots +\beta _{k}X_{ki}-u_{i} \label{order_prob_eq_lc}
\end{equation}%
where $Z_{i}$ is an unobserved variable representing a measure of the net benefit associated with education. The variables $X_{1},$ $X_{2},$ $\ldots ,X_{k}$ include factors that affect either the cost or benefit associated with educational attainment. Thus far, the model looks very much like the probit model discussed above. The main difference is that this model extends the choice to multiple levels of educational attainment. The decision rule is:
Individual $i$ acquires:
\begin{itemize}
\item a high school (or GED) degree if $Z_{i}$ $\leq 0;$ \item 1-3 years of college if $0<Z_{i}\leq \mu _{1}$; \item a Bachelor’s degree if $\mu _{1}<Z_{i}\leq \mu _{2}$; \item a Master’s degree if $\mu _{2}<Z_{i}\leq \mu _{3}$; or \item a Ph.D. (or equivalent advanced degree) if $Z_{i}>\mu _{3}$.
\end{itemize}
The threshold values $\mu _{1}$, $\mu _{2}$, and $\mu _{3}$ are unknown parameters that are estimated by a maximum likelihood estimation procedure along with the parameters $\beta _{1},$ $\beta _{2}$, $\ldots $ ,$\beta _{k}$% .\footnote{%
The LIMDEP econometrics package provides an easy to use procedure for estimating ordered probit models.} This model suggests that individuals with a positive level of $Z_{i}$ acquire at least some college education. Those with a level of $Z_{i}$ greater than $\mu _{1}$ will acquire at least a Bachelor’s degree. Individuals will receive a Master’s or Ph.D. (or equivalent) degree if the level of $Z_{i}$ exceeds either $\mu _{2}$ or $\mu _{3}$ respectively. In the example above, the probabilities of alternative levels of educational attainment are given in Table \ref{ord_ks_lc}.% %TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|cc|}
\hline
\textbf{Outcome} & \textbf{Probability} \\ \hline \multicolumn{1}{|l}{Less than a high school degree} & $\Phi (-Z_{i}^{\ast })$ \\
\multicolumn{1}{|l}{High school degree (or GED)} & $\Phi (\mu _{1}-Z_{i}^{\ast })-\Phi (-Z_{i}^{\ast })$ \\ \multicolumn{1}{|l}{1-3 years of college} & $\Phi (\mu _{2}-Z_{i}^{\ast })-\Phi (\mu _{1}-Z_{i}^{\ast })$ \\
\multicolumn{1}{|l}{Bachelor’s degree} & $\Phi (\mu _{3}-Z_{i}^{\ast })-\Phi (\mu _{2}-Z_{i}^{\ast })$ \\
\multicolumn{1}{|l}{Master’s degree} & $\Phi (\mu _{4}-Z_{i}^{\ast })-\Phi (\mu _{3}-Z_{i}^{\ast })$ \\
\multicolumn{1}{|l}{Ph.D. (or equivalent advanced degree)} & 1-$\Phi (\mu _{4}-Z_{i}^{\ast })$ \\ \hline
\end{tabular}%
\caption{Probabilities of alternative educational outcomes under the ordered probit model\label{ord_ks_lc}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
The parameters of equation \ref{order_prob_eq_lc} and the threshold parameters $\mu _{1}$, $\mu _{2}$, $\mu _{3}$, and $\mu _{4}$ may be estimated using a maximum likelihood estimation procedure. A\ growing number of econometric software packages contain ordered probit estimation procedures. Once the model has been estimated, the predicted probabilities of alternative outcomes can be determined by substituting the estimated values of $\hat{Z}_{i}^{\ast }$, $\hat{\mu}_{1}$, $\hat{\mu}_{2}$, $\hat{\mu}% _{3}$, and $\hat{\mu}_{4}$ into the equations appearing in Table \ref% {ord_ks_lc}.
\subsection{Example:\ Educational attainment} Kane and Spizman (2001) estimated several ordered probit models of educational attainment in which the alternative outcomes correspond to those listed above in Table \ref{ord_ks_lc}. One of the equations estimated using a sample of males is given by:%
\begin{equation}
\hat{Z}_{i}^{\ast }=\underset{(14.204)}{0.802}-\underset{(-1.827)}{0.106}% \text{Hispanic}_{i}-\underset{(-1.281)}{0.063}\text{Black}_{i}-\underset{% (-0.178)}{0.009}\text{Urban}_{i} \label{kane_spiz_op_lc} \end{equation}%
\begin{equation*}
+\underset{(8.212)}{0.410}\text{MHS}_{i}+\underset{(8.630)}{0.687}\text{MSC}% _{i}+\underset{(11.814)}{1.022}\text{MCD}_{i} \end{equation*}%
\begin{equation*}
\underset{(5.302)}{+0.277}\text{FHS}_{i}+\underset{(7.879)}{0.575}\text{FSC}% _{i}+\underset{(13.402)}{0.968}\text{FCD}_{i} \end{equation*}%
\begin{equation*}
\text{(}t\text{-statistics in parentheses)}
\end{equation*}%
\begin{equation*}
\chi ^{2}\text{ = 828.903}
\end{equation*}%
\begin{equation*}
\begin{array}{llll}
\text{where:\ } & \text{Hispanic}_{i} & =1 & \text{if respondent }i\text{ is Hispanic, =0 otherwise} \\
& \text{Black}_{i} & =1 & \text{if respondent }i\text{ is Black, =0 otherwise% } \\
& \text{Urban}_{i} & =1 & \text{if respondent }i\text{ lived in an urban area at age 14, = 0 otherwise} \\
& \text{MHS}_{i} & =1 & \text{if the highest level of education for the respondent’s mother is } \\
& & & \text{a high school degree (or a GED), =0 otherwise} \\ & \text{MSC}_{i} & =1 & \text{if the highest level of education for the respondent’s mother is } \\
& & & \text{1-3 years of college, = 0 otherwise} \\ & \text{MCD}_{i} & =1 & \text{if the respondent’s mother completed 16 or more years of } \\
& & & \text{schooling, = 0 otherwise} \\
& \text{FHS}_{i} & =1 & \text{if the highest level of education for the respondent’s father is a } \\
& & & \text{high school degree (or a GED), =0 otherwise} \\ & \text{FSC}_{i} & =1 & \text{if the highest level of education for the respondent’s father is } \\
& & & \text{1-3 years of college, = 0 otherwise} \\ & \text{FCD}_{i}\text{ } & =1 & \text{if the respondent’s father completed 16 or more years of } \\
& & & \text{schooling = 0 otherwise}%
\end{array}%
\end{equation*}
The estimated threshold values for this ordered probit model appear in Table % \ref{threshold_table_lc}.
\begin{center}
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|c|c|c|}
\hline
\textbf{Threshold value} & \textbf{Estimated value} & $\boldmath{t}$\textbf{% -statistic} \\ \hline
$\hat{\mu}_{1}$ & 2.005 & 54.237 \\
$\hat{\mu}_{2}$ & 2.273 & 56.890 \\
$\hat{\mu}_{3}$ & 3.252 & 64.712 \\
$\hat{\mu}_{4}$ & 3.884 & 56.550 \\ \hline
\end{tabular}%
\caption{Estimated threshold values for educational attainment ordered probit model\label{threshold_table_lc}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\end{center}
By substituting these values into Table \ref{ord_ks_lc}, the probabilities of each alternative level of educational attainment may be predicted for a given combination of independent variables. Consider, for example, a male child with the following characteristics:
\begin{itemize}
\item Hispanic,
\item lived in an urban area, and
\item both parents had a high school degree.
\end{itemize}
For a person with these characteristics, the predicted value of $Z_{i}^{\ast }$ is given by:%
\begin{equation}
\hat{Z}_{i}^{\ast }=0.802-0.106(1)-0.063(0)-0.009(1) \label{ord_prob_fit_ex1.lc}
\end{equation}%
\begin{equation*}
+0.410(1)+0.687(0)+1.022(0)
\end{equation*}%
\begin{equation*}
+0.277(1)+0.575(0)+0.968(0)
\end{equation*}%
\begin{equation*}
=1.374
\end{equation*}%
Using this predicted value of $Z_{i}^{\ast }$ and the estimated threshold values appearing in Table \ref{threshold_table_lc}, the estimated probabilities of alternative levels of educational attainment for this child may be computed as:
\begin{center}
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|cc|}
\hline
\textbf{Outcome} & \textbf{Probability} \\ \hline \multicolumn{1}{|l}{Less than a high school degree} & \multicolumn{1}{|l|}{$% \Phi (-1.374)=0.085$} \\
& \multicolumn{1}{|l|}{} \\ \hline
\multicolumn{1}{|l}{High school degree (or GED)} & \multicolumn{1}{|l|}{$% \Phi (2.005-1.374)-\Phi (-1.374)$} \\
& \multicolumn{1}{|l|}{$=0.736-0.085=0.651$} \\ \hline \multicolumn{1}{|l}{1-3 years of college} & \multicolumn{1}{|l|}{$\Phi (2.273-1.374)-\Phi (2.005-1.374)$} \\
& \multicolumn{1}{|l|}{$=0.816-0.736=0.080$} \\ \hline \multicolumn{1}{|l}{Bachelor’s degree} & \multicolumn{1}{|l|}{$\Phi (3.252-1.374)-\Phi (2.273-1.374)$} \\
& \multicolumn{1}{|l|}{$=0.970-0.816=0.154$} \\ \hline \multicolumn{1}{|l}{Master’s degree} & \multicolumn{1}{|l|}{$\Phi (3.884-1.374)-\Phi (3.252-1.374)$} \\
& \multicolumn{1}{|l|}{$=0.994-0.970=0.024$} \\ \hline \multicolumn{1}{|l}{Ph.D. (or equivalent advanced degree)} & \multicolumn{1}{|l|}{$1-\Phi (3.884-1.374)$} \\ \multicolumn{1}{|l}{} & \multicolumn{1}{|l|}{$=1-0.994=0.006$} \\ \hline \end{tabular}%
\caption{Probabilities of alternative educational outcomes under the ordered probit model\label{ord_ks_lc_est}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\end{center}
These results indicate that a male Hispanic child living in an urban area with two parents with high school degrees will have: \begin{itemize}
\item an 8.5\%\ probability of not completing high school, \item a 65.1\% probability of completing a high school degree, \item an 8.0\% probability of completing 1-3 years of college, \item a 15.4\% probability of completing a bachelor’s degree, \item a 2.4\% probability of completing a master’s degree, and a \item 0.6\%\ probability of completing a Ph.D. (or equivalent) degree.
\end{itemize}
Note that, as expected, the sum of the predicted probabilities of alternative levels of educational attainment equals one.
\subsection{Multinomial logit model}
A major limitation of the ordered probit model is that it can only be used to analyze choices that fall into a commonly agreed upon hierarchy. Many choices among qualitative choices do not fit this category quite as nicely.
While everyone might agree that a Master’s degree represents a higher level of educational attainment than a Bachelor’s degree, the choice between attending an automotive repair training program, a tractor trailer driving school, and a refrigeration repair institute cannot be as easily sorted into a well defined hierarchy. Similarly, the choice between commuting to work using a car, bus, subway, taxi, bicycle, or carpool cannot be as easily ranked. The multinomial logit model is often used to analyze choices among alternatives such as this.
Under the multinomial logit model, it is assumed that person $i\,$faces a choice among $M+1$ mutually exclusive alternatives. The outcome can be summarized by a variable $Y_{i\text{ }}$ that takes on the values $0$, $1$, $% \ldots $, $M$. Assuming that the probability of alternative outcomes follows a multinomial logit distribution the probability of observing any outcome can be expressed as:\footnote{%
Note that the logit model discussed above is a special case of this more general model that occurs when $M=1$.}
\begin{equation}
\text{Prob(}Y_i=m\text{) = }\frac{e^{\beta _o^s+\beta _1^sX_{1i}+\cdots +\beta _k^sX_{ki}}}{1+\sum_{m=1}^Me^{\beta _o^m+\beta _1^mX_{1i}+\cdots +\beta _k^mX_{ki}}}\text{ for }m=1,2,\ldots ,M \label{multi.logit.1} \end{equation}
and
\begin{equation}
\text{Prob(}Y_i=0\text{) = }\frac 1{1+\sum_{m=1}^Me^{\beta _o^m+\beta _1^mX_{1i}+\cdots +\beta _k^mX_{ki}}} \label{multi.logit.2} \end{equation}
Dividing equation \ref{multi.logit.1} by equation \ref{multi.logit.2} and taking the natural log of both sides, results in: \begin{equation}
\ln \left( \frac{\text{Prob(}Y_i=m\text{)}}{\text{Prob(}Y_i=0\text{)}}% \right) =\beta _o^m+\beta _1^mX_{1i}+\cdots +\beta _k^mX_{ki} \label{logodds}
\end{equation}
Thus, under the multinomial logit model, the log-odds ratio for any choice (the term on the left-hand side of equation \ref{logodds}) is a linear function of the individual’s characteristics. Separate equations are estimated for choices $1$ through $J$. Even though there are $J+1$ choices, only $J$ log-odds ratios must be computed since the sum of the probabilities must add to 1.
\section{Sample selectivity models: the Heckman procedure} In many econometric applications, the dependent variable is not observed for all sample respondents. This does not present a problem for regression analysis as long as this is a random occurrence. Quite often, however, the data is missing for a particular reason. For example, consider an econometrician wishing to estimate the determinants of an individual’s SAT score. Data on SAT scores will be available only for individuals who choose to take the SAT exam. It is quite likely that the individuals who choose to take the SAT will be those who, on average, expect to perform relatively well on the test. In other words, the individuals who choose to take the SAT may differ in a systematic manner from the individuals who choose not to take the SAT exam.
%TCIMACRO{%
%\TeXButton{Dewey vs Truman}{\exbox{Dewey Defeats Truman?}{At some point, you have probably seen a picture of Harry Truman holding a copy of %the {\it Chicago Tribune} that contained a headline proclaiming “Dewey Defeats %Truman.” This prediction was primarily based upon polls that had indicated that %Dewey would carry the election. These polls, however, were not based upon a %random sample of the appropriate population. In particular, each of the major polls was based %on samples in which individuals with a grade school education were %underrepresented. Since these voters tended to favor Truman, these polls tended %to understate Truman’s support. In addition, the major polls did not ask individuals %whether they intended to vote. Thus, the population sampled did not necessarily %reflect the relevant population of voters. (A further complication is that a %significant portion of the electorate made that decision after the last polls were %conducted.)
%
%For a good discussion of this case, see Manchester (1974), %pp. 467-471.}}}%
%BeginExpansion
\exbox{Dewey Defeats Truman?}{At some point, you have probably seen a picture of Harry Truman holding a copy of the {\it Chicago Tribune} that contained a headline proclaiming “Dewey Defeats Truman.” This prediction was primarily based upon polls that had indicated that Dewey would carry the election. These polls, however, were not based upon a random sample of the appropriate population. In particular, each of the major polls was based on samples in which individuals with a grade school education were underrepresented. Since these voters tended to favor Truman, these polls tended to understate Truman’s support. In addition, the major polls did not ask individuals whether they intended to vote. Thus, the population sampled did not necessarily reflect the relevant population of voters. (A further complication is that a significant portion of the electorate made that decision after the last polls were conducted.)
For a good discussion of this case, see Manchester (1974), pp. 467-471.}%
%EndExpansion
Suppose an OLS estimation technique is used to generate estimates of an SAT equation using only the observations for individuals who choose to take the SAT exam. This equation would provide the best fit for the observed sample.
It is quite likely, however, that it would overstate the SAT scores that would have been received by those individuals who initially chose not to take the SAT. In this case, a \textbf{sample selectivity bias} is said to occur. The potential for a sample selectivity bias problem exists whenever the sample used for estimation purposes is not a random sample of the population.
A simple example can serve to illustrate the problem of sample selectivity bias. Suppose that the relationship between SAT scores and high school GPA in the population is given by:
\begin{equation}
\text{SAT}_{i}=\beta _{o}+\beta _{1}\text{GPA}_{i}+u_{i} \label{select.ex.1.lc}
\end{equation}%
Not everyone, however, chooses to take the SAT exam. In particular, it is likely that individuals with positive error terms ($u_{i}$) will be more likely to take the SAT than individuals with negative error terms.\footnote{% Note that the error term in the SAT equation (equation \ref{select.ex.1.lc}) is partly the result of differences in ability that are not fully captured by the observable ability variable (GPA). Students who believe that they are more likely to perform better on the SAT than other students with the same GPA are expected to be more likely to take the GPA exam.} In this case, the observed relationship between SAT scores and high school GPAs will lie above the population relationship (since those who expect low scores will be less likely to be included in the observed sample). Figure~\ref{selectb_g_lim} illustrates this relationship. In this diagram, it can be seen that the estimated relationship between high school GPA and SAT scores lies above the population relationship.
\begin{center}
\FRAME{ftbpFU}{5.1283in}{3.1808in}{0pt}{\Qcb{Selectivity bias in SAT equation% }}{\Qlb{selectb_g_lim}}{fig14-8.gif}{\special{language “Scientific Word”;type “GRAPHIC”;maintain-aspect-ratio TRUE;display “USEDEF”;valid_file “F”;width 5.1283in;height 3.1808in;depth 0pt;original-width 5.073in;original-height 3.1358in;cropleft “0”;croptop “1”;cropright “1”;cropbottom “0”;filename ‘GRAPHS/Fig14-8.gif’;file-properties “XNPEU”;}} \end{center}
Let’s examine a general model that can be used to represent a wide variety of sample selectivity problems.
\subsection{Sample selectivity bias model}
Suppose that the relationship between the dependent variable, $Y$, and a set of $k$ independent variables, $X_{1},\ldots ,X_{k}$, is given by: \begin{equation}
Y_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+\cdots +\beta _{k}X_{ki}+u_{i} \label{select.lc}
\end{equation}%
In a sample selection model, the true value of $Y_{i}$ is not observed for all observations. Define a qualitative variable, $D_{i}$, as: \begin{equation*}
\begin{array}{l}
D_{i}=1\text{ if }Y_{i}\text{ is observed} \\ D_{i}=0\text{ if }Y_{i}\text{ is not observed}% \end{array}%
\end{equation*}%
In the most commonly used form of sample selectivity model, it is assumed that the selection process can be described by a probit model:\footnote{% A logit specification is also often used to describe the selection process.} \begin{equation*}
\begin{array}{l}
D_{i}=1\text{ if }Z_{i}\geq 0 \\
D_{i}=0\text{ if }Z_{i}<0%
\end{array}%
\end{equation*}%
\begin{equation}
\begin{array}{ll}
\text{where:} & Z_{i}=\gamma _{o}+\gamma _{1}W_{1i}+\gamma _{2}W_{2i}+\cdots +\gamma _{m}W_{mi}-v_{i}%
\end{array}
\label{select2.lc}
\end{equation}%
In this case, the unobserved variable $Z_{i}$ is assumed to be a linear function of a set of $m$ independent variables, $W_{1},W_{2},\ldots ,W_{m}$, and a random error term, $v_{i}$. In this model, it is generally assumed that the error terms $u_{i}$ and $v_{i}$ are jointly distributed as a bivariate normal distribution. Specifically, it is assumed that:\footnote{% As noted above, in a probit model the estimated coefficients are equal to $% \beta _{i}/\sigma _{v}$. Since it is not possible to obtain unique estimate of each of the parameters in the model, the variance of the error term in the probit equation is often normalized to one ($\sigma _{v}=1$).
\par
It should be noted, however, that under the Tobit version of this model, it is possible to estimate the variance parameter $\sigma _{v}$ since there is only one error term under this specification (and $\sigma _{u}$ = $\sigma _{v}$). The interested reader is referred to the discussion in Tobin (1958) and Olsen (1978).}
\begin{equation*}
E(u_{i})=0
\end{equation*}%
\begin{equation*}
E(v_{i})=0
\end{equation*}%
\begin{equation*}
var(u_{i})=\sigma _{u}^{2}
\end{equation*}%
\begin{equation*}
var(v_{i})=1
\end{equation*}%
\begin{equation*}
cov(u_{i}v_{j})=\sigma _{uv}
\end{equation*}
Under these assumptions, in the observed sample, the conditional expectation of the dependent variable in equation \ref{select.lc} equals:\footnote{% A proof of this proposition is beyond the scope of this text. The interested (and mathematically sophisticated) reader may find a more complete discussion in Heckman (1976, 1979); Maddala (1983), Chapter 6; or Greene (2000), Chapter 20.}
\begin{equation}
E(Y_{i}|Z_{i}\geq 0)=\beta _{o}+\beta _{1}X_{1i}+\cdots +\beta _{k}X_{ki} \label{select3.lc}
\end{equation}%
\begin{equation*}
+\sigma _{uv}\left( \frac{\phi (\gamma _{o}+\gamma _{1}W_{1i}+\cdots +\gamma _{m}W_{mi})}{\Phi (\gamma _{o}+\gamma _{1}W_{1i}+\cdots +\gamma _{m}W_{mi})}% \right)
\end{equation*}%
where $\phi (\cdot )$ and $\Phi (\cdot )$ are the PDF and CDF, respectively, for a standard normal density function. In the entire population, however, the expected value of the dependent value is given by: \begin{equation}
E(Y_{i})=\beta _{o}+\beta _{1}X_{1i}+\cdots +\beta _{k}X_{ki} \label{select4.lc}
\end{equation}%
Note that the conditional expectation of the dependent value in the observed sample will differ from its expected value in the population whenever $% \sigma _{uv}$ is not equal to zero. This relationship is a more general form of the sample selectivity bias problem represented in Figure~\ref% {selectb_g_lim}.
A comparison of equations \ref{select3.lc} and \ref{select4.lc} illustrates the nature of the sample selectivity bias problem. While it is assumed that the standard regression relationship represented by equation \ref{select4.lc} can be used to explain a dependent variable in the entire population, a different relationship (equation \ref{select3.lc}) characterizes the relationship existing for the selected sample observed by the researcher.
To simplify the notation, it is convenient to define: \begin{equation*}
\lambda _{i}=\frac{\phi (\gamma _{o}+\gamma _{1}W_{1i}+\cdots +\gamma _{m}W_{mi})}{\Phi (\gamma _{o}+\gamma _{1}W_{1i}+\cdots +\gamma _{m}W_{mi})} \end{equation*}%
and:
\begin{equation*}
\beta _{k+1}=\sigma _{uv}
\end{equation*}%
Using this notation, equation \ref{select3.lc} may be restated as: \begin{equation}
E(Y_{i}|Z_{i}\geq 0)=\beta _{o}+\beta _{1}X_{1i}+\cdots +\beta _{k}X_{ki}+\beta _{(k+1)}\lambda _{i} \label{select5.lc} \end{equation}
Suppose the following regression equation is estimated for the observed sample:
\begin{equation}
Y_i=\beta _o+\beta _1X_{1i}+\cdots +\beta _kX_{ki}+\epsilon _i \label{select6.lc}
\end{equation}
Since this equation omits the $\lambda _i$ term that appears in equation \ref% {select5.lc}, the resulting estimates of the intercept and slope terms will be biased and inconsistent.\footnote{%
Biased and inconsistent estimates will always be expected to result unless the error term in the sample selection equation is independent of the error term in the population regression function. If these error terms are independent, then $\sigma_{uv}= \beta_{k+1} =0$. In this case, an OLS estimation procedure will provide unbiased and consistent estimates using only the observed sample.} Thus, the problem of sample selectivity bias can be thought of as a form of omitted variable bias. This insight provides the basis for Heckman’s sample selectivity bias correction technique discussed below.
\subsection{The Heckman two-stage estimation procedure} Heckman’s two-stage estimator relies on the following procedure: \begin{enumerate}
\item[Step 1:] Use a probit estimation technique to estimate the parameters of the selection equation:
\begin{equation*}
Z_{i}=\gamma _{o}+\gamma _{1}W_{1i}+\gamma _{2}W_{2i}+\cdots +\gamma _{m}W_{mi}-v_{i}
\end{equation*}
\item[Step 2:] These estimated parameters are then used to construct a predicted value of $Z_{i}^{*}$ for each observation using the relationship: \begin{equation*}
\hat{Z}_{i}^{*}=\hat{\gamma}_{o}+\hat{\gamma}_{1}W_{1i}+\hat{\gamma}% _{2}W_{2i}+\cdots +\hat{\gamma}_{m}W_{mi} \end{equation*}
(Most econometric software packages contain an option that allows the user to automatically save the fitted values of $\hat{Z}_{i}^{*}$when a probit equation is estimated.) Use these estimated “Z-scores” to construct estimated values of the selectivity bias adjustment term, $\lambda _{i}$, for each observation using the formula:
\begin{equation*}
\hat{\lambda}_{i}=\frac{\phi (\hat{Z}_{i}^{*})}{\Phi (\hat{Z}_{i}^{*})} \end{equation*}
Once again, most econometrics software packages contain functions to evaluate the PDF and CDF for a standard normal variate. These functions can be invoked to create the variable $\hat{\lambda}_{i}$.
\item[Step 3:] Include the $\hat{\lambda}_{i}$ term as an additional regressor in the original regression equation to form: \begin{equation*}
Y_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+\cdots +\beta _{k}X_{ki}+\beta _{k+1}\hat{\lambda}_{i}+u_{i} \end{equation*}
Estimate the parameters of this equation using an OLS estimation technique.
Since $\hat{\lambda}_{i}$ is a consistent estimator of $\lambda _{i}$, the resultant estimates of the intercept and slope parameters are consistent estimates of the corresponding population parameters.\footnote{% It should be noted, however, that the standard errors from the OLS estimates must be corrected to take into account the use of the generated regressor $% \hat \lambda _i$. The econometric package, LIMDEP, available from William Greene, automatically provides corrected standard errors.} A simple test for the presence of sample selectivity bias involves a test of the null hypothesis:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{k+1}=0
\end{equation*}
(Note that if $\beta _{k+1}=0$ then an omitted variable problem does not occur.) This test can be conducted using the estimated $t$-ratio on this parameter. If this hypothesis can be rejected, it is safest to assume that a sample selectivity bias problem is present.
\end{enumerate}
The Heckman procedure has become widely adopted during the past 20 years. It provides a relatively simple correction method for a very common problem that occurs with many cross-sectional and longitudinal data sets. There are, however, a few problems associated with the use of this procedure: \begin{itemize}
\item The results of these models are very sensitive to model specification.
Suppose, for example, that a variable that belongs in the regression equation is instead included as the only variable in the probit equation. In this case, the $\hat{\lambda}_{i}$ term will be a monotonic transformation of this variable.\footnote{%
A monotonic transformation is said to occur if an increase in the original variable is always associated with an increase in the transformed value and a decrease in the original variable is always associated with a decrease in the transformed variable. Thus, under a monotonic transformation the original and transformed variables are always changing in the same direction (although generally by different amounts).}\label{monotonic} In this case, it is likely that the estimated term $\beta _{k+1}$ will be significant even if no sample selectivity bias is present. More generally, if the regression model is not properly specified, the $\beta _{k+1}$ term may end up capturing nonlinear effects of variables that appear in both the regression and the probit equations.
\item In most applications, the probit equation will include some or all of the variables that appear in the regression equation. While this does not necessarily present a problem, under certain circumstances it may result in a multicollinearity problem. Suppose, for example, that the same variables appear in both the regression and probit equations. The estimated value of $% \hat{Z}_{i}^{*}$ is a linear combination of the variables in the probit equation. If these values fall within a narrow range in a particular sample, the $\hat{\lambda}_{i}$ terms will involve a transformation of these variables that may be approximately linear. In this case, the $\hat{\lambda}% _{i}$ terms may be approximately equal to a linear combination of the other variables included in the regression equation.
\item It is possible that the estimated correlation between the disturbance terms in the regression and probit equations may exceed one in absolute value.\footnote{%
In this case, the adjusted $t$-ratios may become negative.} In this case, the standard practice is to constrain the estimate (setting the estimated correlation equal to 1 or -1, as appropriate). This problem occurs relatively frequently in applied work.
\end{itemize}
\subsection{Example: Sample selectivity bias and SAT scores} Let’s examine a simple model of SAT scores for individuals that were participants in the \textit{National Longitudinal Study of the High School Class of 1972}. A possible SAT equation is given by: \begin{equation}
\text{SAT}_{i}=\beta _{o}+\beta _{1}\text{HSRANK}_{i}+\beta _{2}\text{MLHS}% _{i}+\beta _{3}\text{MCOL}_{i} \label{sat.eq.lc} \end{equation}%
\begin{equation*}
+\beta _{4}\text{FLHS}_{i}+\beta _{5}\text{FCOL}_{i}+\beta _{6}\text{FEMALE}% _{i}
\end{equation*}%
\begin{equation*}
\beta _{7}\text{READING}_{i}+\beta _{8}\text{VOCAB}_{i}+\beta _{9}\text{MATH}% _{i}+u_{i}
\end{equation*}%
In this equation, the SAT variable includes scores for only those individuals that actually reported an SAT score. (In this particular sample, only 2275 of 6370 individuals reported SAT scores.) The variables READING, VOCAB and MATH are scores on reading, vocabulary, and math exams that were given to all participants in this study. (Each of these tests is scaled so that the mean is 50 and standard deviation is 10.) All other variables are defined as in Table \ref{vdef.lc}.
Since SAT scores are not observed for all participants, an OLS\ estimation process using only the observed sample would result in a potential sample selectivity bias. To avoid this, a probit selection model is specified as: \begin{equation}
Z_{i}=\gamma _{o}+\gamma _{1}\text{HSRANK}_{i}+\gamma _{2}\text{MLHS}% _{i}+\gamma _{3}\text{MCOL}_{i} \label{sam.sel.eq.lc} \end{equation}%
\begin{equation*}
+\gamma _{4}\text{FLHS}_{i}+\gamma _{5}\text{FCOL}_{i}+\gamma _{6}\text{% FEMALE}_{i}
\end{equation*}%
\begin{equation*}
+\gamma _{7}\text{NSIB}_{i}-v_{i}
\end{equation*}%
The parameters of this probit equation were estimated using a sample of 6370 respondents. The resulting equation is:
\begin{equation}
\hat{Z}_{i}^{\ast }=-\underset{(-19.12)}{1.122}+\underset{(22.17)}{0.015}% \text{HSRANK}_{i}-\underset{(-3.71)}{0.180}\text{MLHS}_{i} \label{est.sel.eq.lc}
\end{equation}%
\begin{equation*}
+\underset{(5.92)}{0.247}\text{MCOL}_{i}-\underset{(-2.95)}{0.143}\text{FLHS}% _{i}+\underset{(8.58)}{0.368}\text{FCOL}_{i} \end{equation*}%
\begin{equation*}
-\underset{(-6.45)}{0.228}\text{FEMALE}_{i}-\underset{(-7.37)}{0.062}\text{% NSIB}_{i}
\end{equation*}%
\begin{equation*}
(t\text{-statistics in parentheses})
\end{equation*}
Each of the variables in equation \ref{est.sel.eq.lc} is significant at the .01 level. Individuals with higher high school class rank and/or parents who have attended college are more likely to take the SAT. In this sample, females were significantly less likely to take the SAT.\footnote{% It is likely that this result would be different in more recent cohorts of high school seniors. In 1972, females comprised a smaller proportion of college enrollment than has been true in recent years.} The presence of siblings significantly reduces the probability of taking the SAT exam.
The estimated results from equation \ref{est.sel.eq.lc} were then used to construct estimated values of $\lambda _{i}$ using the formula:\footnote{% These variables are automatically created by the econometric software packages LIMDEP and Stata when sample selectivity model are estimated.} \begin{equation*}
\hat{\lambda}_{i}=\frac{\phi (\hat{Z}_{i}^{\ast })}{\Phi (\hat{Z}_{i}^{\ast })}
\end{equation*}%
After including $\hat{\lambda}_{i}$ as an additional regressor, the parameters of the original regression equation (equation \ref{sat.eq.lc}) were estimated using an OLS procedure. The resulting estimates are: \begin{equation}
\widehat{\text{SAT}}_{i}=-\underset{(-12.70)}{566.44}+\underset{(8.45)}{2.17}% \text{HSRANK}_{i}-\underset{(-1.097)}{7.99}\text{MLHS}_{i}+\underset{(4.82)}{% 28.20}\text{MCOL}_{i} \label{sat.est.eq.lc} \end{equation}%
\begin{equation*}
-\underset{(-1.04)}{6.94}\text{FLHS}_{i}+\underset{(5.18)}{37.15}\text{FCOL}% _{i}-\underset{(-8.40)}{46.84}\text{FEMALE}_{i} \end{equation*}%
\begin{equation*}
+\underset{(18.11)}{6.19}\text{READING}_{i}+\underset{(25.06)}{7.14}\text{% VOCAB}_{i}+\underset{(26.63)}{9.39}\text{MATH}_{i}+\underset{(3.34)}{76.79}% \hat{\lambda}_{i}
\end{equation*}
For our purposes, the most interesting result here is the coefficient on $% \hat{\lambda}_{i}$. Since $\hat{\lambda}_{i}$ is guaranteed to be positive,% \footnote{%
To see that $\hat{\lambda}_{i}$ is always positive, note that the numerator and denominator of this expression are, respectively, the PDF and CDF of a standard normal density function evaluated at the point $\hat{Z}_{i}^{\ast }$% . This ratio must be positive since the PDF and CDF are positive for all observed \ values of $\hat{Z}_{i}^{\ast }$.} the positive coefficient on this variable indicates that there is positive selectivity bias in this sample. In particular, since the mean value of $\hat{\lambda}_{i}$ is 0.88 in this sample, these results suggest that the average person who reported an SAT score would have a score, that is 67.6 (76.79 $\times $ $0.88$) points higher than a random person in the population with identical observable characteristics.
Further examination of equation \ref{sat.est.eq.lc} indicates that the SAT scores of high school students are substantially higher in households in which both parents have attended college. The negative (and highly significant) coefficient on the gender dummy variable may provide evidence of a gender bias in the SAT exam.
\section{Sample selectivity models: The Tobit Model} A special case of the general sample selectivity model discussed above was examined by Tobin (1958). In Tobin’s model, the selection criteria is the level of the dependent variable itself. This model, which quickly came to be called the \textbf{Tobit model}, the dependent variable is censored at an upper or lower bound. No values beyond this point are ever observed.
Examples of such situations include:
\begin{itemize}
\item labor supply equations in which the lower bound of zero is reported for everyone who chooses to not work,
\item earnings equations in which only working individuals report earnings, \item equations explaining the receipt of welfare benefits, college aid, pension benefits, or any other form of income that equals zero for a nontrivial portion of the population..
\end{itemize}
The Tobit model assumes that dependent variable may be expressed as a linear function of the independent variables:
\begin{equation}
Y_{i}=\beta _{o}+\beta _{1}X_{1i}+\beta _{2}X_{2i}+\cdots +\beta _{k}X_{ki}+u_{i} \label{tobit}
\end{equation}%
In the most common form of the Tobit specification, $Y_{i}$ is observed only when it is greater than or equal to zero. Thus, the observed dependent variable, $Y_{i}^{\ast }$ is reported as:\footnote{% While the most common application of the Tobit model is to cases in which the dependent variable is observed only when it is greater than or equal to zero, the Tobit estimator can be easily applied to any model in which the dependent variable is observed only in cases in which the dependent variable is observed only for cases below an specific upper bound or above any specific lower bound.}
\begin{center}
$%
\begin{array}{l}
Y_{i}^{\ast }=Y_{i}\text{ if }Y_{i}\text{ }>0 \\ Y_{i}^{\ast }=0\text{ if }Y_{i}\leq 0%
\end{array}%
$
\end{center}
Since the Tobit model predates the more general sample selectivity model appearing above, many econometric packages contain Tobit estimators. These estimators provide maximum likelihood estimators of the parameters of equation \ref{tobit}. Since the Tobit model is a special case of the general sample selectivity model appearing above, it may also be estimated using the Heckman procedure described above (the variables in the probit selection rule equation coincide with those in equation \ref{tobit}).\footnote{% A more detailed discussion of the Tobit model may be found in Tobin (1958), Maddala (1983), or Greene (2000).}
%TCIMACRO{%
%\TeXButton{Fair-Affairs}{\exbox{Extramarital Affairs}{ %Ray Fair (1978) conducted an interesting econometric analysis of extramarital affairs.
%The data for this study was derived from responses to surveys conducted by %{\it Redbook} and {\it Psychology Today}. The dependent variable in this study is %the number of times the respondent had engaged in extramarital sexual intercourse %during the past year. Since this dependent variable has a lower bound of zero, Fair’s %study relied on Tobit analysis.
%
%Not surprisingly, Fair found that extramarital affairs are less frequent for individuals %that report a higher level of marital happiness. Fair also finds, however, that extramarital %affairs decline with the age of the individual, but increase with the length of marriage.
%Individuals who report a higher degree of religiosity appear to have fewer affairs.
%}}}%
%BeginExpansion
\exbox{Extramarital Affairs}{
Ray Fair (1978) conducted an interesting econometric analysis of extramarital affairs.
The data for this study was derived from responses to surveys conducted by {\it Redbook} and {\it Psychology Today}. The dependent variable in this study is the number of times the respondent had engaged in extramarital sexual intercourse during the past year. Since this dependent variable has a lower bound of zero, Fair’s study relied on Tobit analysis.
Not surprisingly, Fair found that extramarital affairs are less frequent for individuals that report a higher level of marital happiness. Fair also finds, however, that extramarital affairs decline with the age of the individual, but increase with the length of marriage.
Individuals who report a higher degree of religiosity appear to have fewer affairs.
}%
%EndExpansion
\subsection{Example:\ College financial aid awards} Suppose that an econometrician wishes to examine the determinants of financial aid awards to college students.\footnote{% This discussion is derived from the analysis appearing in Kane and Spizman (1994).} It is expected that the level of financial aid awarded by a college will be affected by a student’s observable ability and family background characteristics. The data file \textquotedblleft finaid.dat\textquotedblright\ contains data for 3347 participants in the \textit{National Longitudinal Study of the High School Class of 1972 (NLS72) }who attended one or more years of college. (All participants in this survey were high school seniors in 1972). In this data set, it is observed that only 476 of the 1681 male college attendees in the sample have received financial aid grants. Since the dependent variable takes on a value of zero for a substantial share of observations, a Tobit analysis is appropriate.
The parameters of an estimated Tobit equation for financial aid grants for a sample of male college students appears in Table \ref{Tobit_est}.
(Definitions of the variables appear in Table \ref{Tobit_est_def}.) \begin{center}
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|cc|}
\hline
\textbf{Variable} & $\underset{\text{(t-ratio)}}{\text{\textbf{Coefficient}}} $ \\ \hline
\multicolumn{1}{|l}{constant} & \multicolumn{1}{r|}{$\underset{(4.21)}{% 3384.80}$} \\
\multicolumn{1}{|l}{SAT} & \multicolumn{1}{r|}{$\underset{(4.34)}{1.232}$} \\
\multicolumn{1}{|l}{Rank} & \multicolumn{1}{r|}{$\underset{(6.60)}{15.139}$} \\
\multicolumn{1}{|l}{Black} & \multicolumn{1}{r|}{$\underset{(2.07)}{384.092}$% } \\
\multicolumn{1}{|l}{Hispanic} & \multicolumn{1}{r|}{$\underset{(-0.52)}{% -123.301}$} \\
\multicolumn{1}{|l}{Asian} & \multicolumn{1}{r|}{$\underset{(-0.64)}{-205.187% }$} \\
\multicolumn{1}{|l}{Some college (mother)} & \multicolumn{1}{r|}{$\underset{% (-1.59)}{-147.556}$} \\
\multicolumn{1}{|l}{Some college (father)} & \multicolumn{1}{r|}{$\underset{% (-3.08)}{-290.937}$} \\
\multicolumn{1}{|l}{Rural 72} & \multicolumn{1}{r|}{$\underset{(-0.22)}{% -24.002}$} \\
\multicolumn{1}{|l}{Urb72} & \multicolumn{1}{r|}{$\underset{(-0.19)}{-23.477} $} \\
\multicolumn{1}{|l}{HSATH} & \multicolumn{1}{r|}{$\underset{(1.80)}{164.109}$% } \\
\multicolumn{1}{|l}{HSLEAD} & \multicolumn{1}{|r|}{$\underset{(6.53)}{561.660% }$} \\
\multicolumn{1}{|l}{South} & \multicolumn{1}{r|}{$\underset{(-1.41)}{-160.022% }$} \\
\multicolumn{1}{|l}{Northeast} & \multicolumn{1}{r|}{$\underset{(5.57)}{% 546.243}$} \\
\multicolumn{1}{|l}{Parents’ income} & \multicolumn{1}{r|}{$\underset{(-8.43)% }{-752.261}$} \\
\multicolumn{1}{|l}{\# of siblings} & \multicolumn{1}{r|}{$\underset{(7.01)}{% 137.520}$} \\
\multicolumn{1}{|l}{sibling in college} & \multicolumn{1}{r|}{$\underset{% (0.41)}{35.696}$} \\ \hline
\end{tabular}%
\caption{Tobit estimates of grant award equation\label{Tobit_est}}% %TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|lll|}
\hline
\multicolumn{3}{|c|}{\textbf{Variable definitions}} \\ \hline Grant & = & amount of financial aid grant awarded to the \\ & & respondent \\
SAT & = & actual or imputed SAT score\QQfnmark{% Missing SAT scores were imputed from ACT scores (where available) or from a battery of tests that were administered to all participants in the \textit{% NLS72} survey.} \\
Rank & = & percentile rank in high school class \\ Black & = & 1 if the respondent is African-American \\ & & (=0 otherwise) \\
Hispanic & = & if the respondent is Hispanic (=0 otherwise) \\ Asian & = & if the respondent is Asian-American (=0 otherwise) \\ Some college (mother) & = & 1 if the respondent’s mother completed one \\ & & or more years of college (=0 otherwise) \\ Some college (father) & = & 1 if the respondent’s father completed one \\ & & or more years of college (=0 otherwise) \\ Rural 72 & = & 1 if the respondent lived in a rural area in 1972 \\ & & (=0 otherwise) \\
Urb72 & = & 1 if the respondent lived in an urban area in 1972 \\ & & (=0 otherwise) \\
HSATH & = & 1 if the individual participated in high school athletics \\ & & (=0 otherwise) \\
HSLEAD & = & 1 if the individual was a leader on one or more \\ & & high school clubs or activities (=0 otherwise) \\ South & = & 1 if the respondent lived in the south in 1972 \\ & & (=0 otherwise) \\
Northeast & = & if the respondent lived in the northeast in 1972 \\ & & (=0 otherwise) \\
Parents’ income & = & natural log of parents’ income in 1972 \\ \# of siblings & = & number of siblings for the respondent in 1972 \\ sibling in college & = & 1 if the respondent had one or more siblings in college \\
& & in 1972 (=0 otherwise) \\ \hline
\end{tabular}%
\QQfntext{0}{
Missing SAT scores were imputed from ACT scores (where available) or from a battery of tests that were administered to all participants in the \textit{% NLS72} survey.}\caption{Variable definitions\label{Tobit_est_def}}% %TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\end{center}
\section{Summary}
In this chapter, three models of qualitative choice have been examined: the linear probability, logit, and probit models. While the linear probability model can be easily estimated using standard regression software packages, it is subject to a heteroskedasticity problem and may result in estimated probabilities that are either less than zero or greater than one. In recent years, the logit and probit models have become increasingly important econometric tools. The ordered probit and multinomial logit models are extensions of the probit and logit models that can be used to analyze choices among three or more alternatives.
Since many econometric models are estimated using censored or truncated data for the dependent variable, sample selectivity models have been widely adopted in recent years. Direct OLS estimation using the observed samples may result in biased estimates of the population parameters if the sample selectivity is not taken into account. If it is possible to estimate the probability of observing the dependent variable, however, then the two-stage estimation technique developed by Heckman makes it possible to obtain consistent estimates of population parameters.
\section{Key Concepts}
limited dependent variable
binary choice model
polychotomous choice model
sample selectivity model
linear probability model
constrained linear probability model
probit model
logit model
odds ratio
ordered probit model
multinomial logit model
sample selectivity bias
Heckman estimator
Tobit model
\newpage\
\section{Exercises and problems}
\begin{enumerate}
\item An econometrician wishes to determine whether female educational attainment has an impact on the decision to have children. He has data on a large sample of females (who are now beyond childbearing age). This data includes information on educational attainment, marital status, the number of children raised by each respondent, religion, race, \# of siblings, and related demographic and economic variables.
\begin{enumerate}
\item What type of model specification should be used to investigate this issue (OLS or a limited dependent variable model — such as probit or logit).
\item Specify an equation that may be used to investigate this issue.
\end{enumerate}
\item An econometrician estimates a probit equation designed to explain the decision to live off-campus by college students and obtains the following results:
\begin{equation*}
\hat{Z}_{i}^{\ast }=-2.6+1.6\text{Car}_{i}+0.0002\text{Income}_{i} \end{equation*}%
where Car$_{i}$ is a dummy variable that equals one of the individual owns their own car and Income$_{i}$ equals the student’s annual income in dollars.
\begin{enumerate}
\item Determine the probability of living off campus for a student that does not own a car who earns \$3,000 a year.
\item Determine the probability of living off campus for a student who does not own a car and earns \$10,000 a year.
\item Determine the answers for (a) and (b) for an individual who owns a car. Does a \$7,000 increase in income affect the probability by the same amount in each case? Explain.
\end{enumerate}
\item Consider the case of a female high school senior who is at the 90th percentile in her high school class. Her combined SAT scores are 1280.
Neither of her parents attended college (although both completed high school). She has no siblings.
\begin{enumerate}
\item Determine her estimated probability of attending college using the linear probability model appearing in equation \ref{lpm1.lc}.
\item Determine her estimated probability of attending college using the estimated probit model (equation \ref{lpm3.lc}).
\item Determine her estimated probability of attending college using the logit model (results reported in Table \ref{log.prob.lc}).
\end{enumerate}
\item Consider the model given by:
\begin{equation*}
\text{Attend college if: }Z_{i}\geq 0,\text{ where:} \end{equation*}%
\begin{equation}
Z_{i}=\beta _{o}+\beta _{1}\text{HSRANK}_{i}+\beta _{2}\text{HSLEAD}% _{i}+\beta _{3}\text{SAT}_{i} \label{collegeprob.lc} \end{equation}%
\begin{equation*}
+\beta _{4}\text{FEMALE}_{i}+\beta _{5}\text{MLHS}_{i}+\beta _{6}\text{MCOL}% _{i}
\end{equation*}%
\begin{equation*}
+\beta _{7}\text{FLHS}_{i}+\beta _{8}\text{FCOL}_{i}+\beta _{9}\text{NSIB}% _{i}-u_{i}
\end{equation*}%
(the variable definitions appear in Table \ref{vdef.lc}) \begin{enumerate}
\item Estimate the parameters of a probit model based on equation \ref% {collegeprob.lc} using the data in the file “nls72.dat.” Do your results agree with those reported in Table \ref{log.prob.lc}?
\item At a 1\% significance level, use a likelihood ratio test to test the hypothesis given by:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{5}=\beta _{6}=\beta _{7}=\beta _{8}=0 \end{equation*}
What do you conclude?
\item At a 1\% significance level, use a likelihood ratio test to test the hypothesis given by:
\begin{equation*}
\text{H}_{o}\text{: }\beta _{6}=\beta _{7} \end{equation*}
What do you conclude?
\end{enumerate}
\item Spector and Mazzeo (1980) used probit analysis to investigate the long-run impact resulting from the use of a personalized system of instruction (PSI) in the introductory macroeconomics course. Data was collected on student performance in intermediate macroeconomics courses and the parameters of the following probit model were estimated (using a sample of 32 students):\footnote{%
Since Spector and Mazzeo use a slightly different formulation of the probit model, the estimated coefficients in their paper have signs that are the opposite of those appearing above.}
\begin{equation*}
Y_{i}=1\text{ if student }i\text{ receives a grade of \textquotedblleft A\textquotedblright\ in intermediate macro. course (=0 otherwise)} \end{equation*}%
\begin{equation*}
Y_{i}=1\text{ if }Z_{i}^{\ast }\geq 0
\end{equation*}%
\begin{equation*}
\hat{Z}_{i}^{\ast }=-12.026+\underset{(3.586)}{2.6647}\text{ GPA}_{i}+% \underset{(0.666)}{0.2665}\text{PRIN}_{i}+\underset{(1.86)}{1.07037}\text{PSI% }_{i}
\end{equation*}%
\begin{equation*}
+\underset{(3.00)}{2.18751}\text{MAJ}_{i}-\underset{(-0.935)}{0.41429}\text{% MB}_{i}
\end{equation*}%
\begin{equation*}
\begin{array}{lll}
\text{where:} & \text{GPA}_{i} & \text{= cumulative macroeconomics grade for student }i \\
& \text{PRIN}_{i} & \text{= principles of macroeconomic grade (A=4, B=3, }% \ldots \text{)} \\
& \text{PSI}_{i}\text{ } & \text{= 1 if the student used PSI in his or her introductory macro course} \\
& & \text{= 0 if the student was enrolled in a traditional lecture course} \\
& \text{MAJ}_{i} & \text{= 1 if the student is an economics major} \\ & & \text{= 0 if the student is not an economics major} \\ & \text{MB}_{i} & \text{= 1 if the student had completed a prior money and banking course} \\
& & \text{= 0 if the student had not completed a prior money and banking course}%
\end{array}%
\end{equation*}
\begin{enumerate}
\item Which of the coefficients in this equation are statistically significant at the 5\% significance level?
\item What does this equation suggest about the impact of the student’s performance in introductory macroeconomics on their success in intermediate macroeconomics? Does completing a money and banking class appear to improve student performance in intermediate macroeconomics?
\item Spector and Mazzeo cite evidence that suggests that the use of PSI improves student performance in introductory economics. What does this study suggest about the long-run effect of PSI?
\end{enumerate}
\item Michael Zimmer (2001) provides an analysis of the determinants of divorce probabilities for a sample of married couples. The estimated parameters of one of the probit equations estimated in this study appear in Table \ref{zimmer_est}. The dependent variable in this model is a dummy variable that equals one if a couple divorced during the years 1980-1988 and 0 if the marriage remained intact. A definition of the variables used in this study appears in Table \ref{zimmer_def}. Interpret these results. What information do these results provide about factors influencing divorce probabilities?
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|cc|}
\hline
\textbf{Variable} & $\underset{\text{(t-ratio)}}{\text{\textbf{Coefficient}}} $ \\ \hline
\multicolumn{1}{|l}{constant} & \multicolumn{1}{r|}{$\underset{(1.38)}{1.708} $} \\
\multicolumn{1}{|l}{HAGEFM} & \multicolumn{1}{r|}{$\underset{(-2.69)}{-0.063} $} \\
\multicolumn{1}{|l}{WAGEFM} & \multicolumn{1}{r|}{$\underset{(0.16)}{0.003}$} \\
\multicolumn{1}{|l}{HPREVM} & \multicolumn{1}{r|}{$\underset{(0.14)}{0.024}$} \\
\multicolumn{1}{|l}{WPREVM} & \multicolumn{1}{r|}{$\underset{(3.09)}{0.489}$} \\
\multicolumn{1}{|l}{HEDUC} & \multicolumn{1}{r|}{$\underset{(0.46)}{0.010}$} \\
\multicolumn{1}{|l}{WEDUC} & \multicolumn{1}{r|}{$\underset{(-0.28)}{-0.008}$% } \\
\multicolumn{1}{|l}{HBLACK} & \multicolumn{1}{r|}{$\underset{(-1.79)}{-0.596} $} \\
\multicolumn{1}{|l}{HGHEALTH} & \multicolumn{1}{r|}{$\underset{(1.76)}{0.341} $} \\
\multicolumn{1}{|l}{WGHEALTH} & \multicolumn{1}{r|}{$\underset{(-0.61)}{% -0.077}$} \\
\multicolumn{1}{|l}{MARHAP} & \multicolumn{1}{r|}{$\underset{(-9.96)}{-0.117} $} \\
\multicolumn{1}{|l}{COHAB} & \multicolumn{1}{r|}{$\underset{(1.25)}{0.164}$} \\
\multicolumn{1}{|l}{CHILDU18} & \multicolumn{1}{r|}{$\underset{(1.34)}{0.057} $} \\
\multicolumn{1}{|l}{DURATION} & \multicolumn{1}{r|}{$\underset{(-3.78)}{% -0.026}$} \\
\multicolumn{1}{|l}{HPCTINC} & \multicolumn{1}{r|}{$\underset{(-2.66)}{-0.006% }$} \\
\multicolumn{1}{|l}{LOG(FAM INC)} & \multicolumn{1}{r|}{$\underset{(1.73)}{% 0.209}$} \\
\multicolumn{1}{|l}{LOG (FAM\ ASSETS)} & \multicolumn{1}{r|}{$\underset{% (-2.99)}{-0.041}$} \\
\multicolumn{1}{|l}{$\chi ^{2}$} & \multicolumn{1}{r|}{$207.40$} \\ \multicolumn{1}{|l}{d.f.} & \multicolumn{1}{r|}{$16$} \\ \multicolumn{1}{|l}{Number of obs.} & \multicolumn{1}{r|}{$1389$} \\ \hline \end{tabular}%
\caption{Marital dissolution probit equation (Zimmer(2001)))\label{zimmer_est}}%
%TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
%TCIMACRO{\TeXButton{B}{\begin{table}[tbp] \centering}}% %BeginExpansion
\begin{table}[tbp] \centering%
%EndExpansion
\begin{tabular}{|lll|}
\hline
\multicolumn{3}{|c|}{\textbf{Variable definitions}} \\ \hline HAGEFM & = & husband’s age at first marriage \\ WAGEFM & = & wife’s age at first marriage \\ HPREVM & = & 1 if husband had previous marriage (=0 otherwise) \\ WPREVM & = & 1 if wife had previous marriage (=0 otherwise) \\ MARHAP & = & index of marital happiness based on survey responses \\ COHAB & = & 1 if the couple had cohabited before marriage \\ & = & (=0 otherwise) \\
CHILDU18 & = & number of children under 18 years of age \\ DURATION & = & duration of the marriage in years \\ HPCTNC & = & percentage of family income earned by the husband \\ HBLACK & = & 1 if husband is black (=0 otherwise) \\ HEDUC & = & husband’s highest grade of schooling completed \\ WEDUC & = & wife’s highest grade of schooling completed \\ HGHEALTH & = & 1 if husband’s health is good (=0 otherwise) \\ WGHEALTH & = & 1 if wife’s health is good (=0 otherwise) \\ FAM\ INC & = & Total family income in 1980 \\ FAM\ ASSETS & = & total value of family assets in 1980 (in thousands of dollars) \\ \hline
\end{tabular}%
\caption{Variable definitions\label{zimmer_def}}% %TCIMACRO{\TeXButton{E}{\end{table}}}%
%BeginExpansion
\end{table}%
%EndExpansion
\item Table \ref{election.dat} in Appendix \ref{data.appendix} (and the file \textquotedblleft election.dat\textquotedblright ) contains data on the 1992 Presidential election. The variable DEM$_{i}$ equals one if the Democratic candidate receives a plurality of the state’s vote and equals zero otherwise. The variable UN$_{i}$ is the statewide unemployment rate in 1992.
\begin{enumerate}
\item Estimate a linear probability model to explain whether a state supported the Democratic candidate using the model: \begin{equation*}
\text{DEM}_{i}=\beta _{o}+\beta _{1}\text{UN}_{i}+u_{i} \end{equation*}
\item Estimate a probit model to explain whether a state supported the Democratic candidate using the specification: \begin{equation*}
Z_{i}=\beta _{o}+\beta _{1}\text{UN}_{i}-v_{i} \end{equation*}
\item Repeat part (b) using a logit model.
\item Compare and contrast the parameter estimates and $t$-ratios for the estimated linear probability, logit. and probit models. Do the three models provide similar results?
\item Since data is available on the Democratic candidate’s share of the vote (DVOTE$_{i}$) a linear regression model (using DVOTE$_{i}$ as the dependent variable) could have been used instead of the probit or logit specifications. Is a linear regression model more appropriate? Does using a dummy variable as the dependent variable reduce the amount of information used to estimate model parameters? Explain.
\end{enumerate}
\item
\begin{enumerate}
\item Use the data appearing in the file \textquotedblleft ordprob.dat\textquotedblright\ (this data set is described in Table \ref% {ordprob.dat} on p. \pageref{ordprob.dat})\ to estimate an ordered probit model of educational attainment based on individual and family background characteristics. (Note: The variable \textquotedblleft ed\textquotedblright\ is already constructed to serve as the dependent variable is this model. The respondent’s own educational attainment variables LHS, HS, SC, CD, MA, and PhD are included only for convenience in computing descriptive statistics; these educational attainment variables should not be included as independent variables in your ordered probit model).
\item Interpret the results of your model.
\end{enumerate}
\item Consider the estimated ordered probit model appearing in equation \ref% {kane_spiz_op_lc}. Determine how the results appearing in Table \ref% {ord_ks_lc_est} would change when considering the case of a white male child living in a suburb (urban=0) who is the child of two parents who each possess 16+ years of schooling. Show your work each of the predicted probabilities.
\item Use the data in the file \textquotedblleft sat.dat\textquotedblright\ to verify the results reported in equations \ref{est.sel.eq.lc} and \ref% {sat.est.eq.lc}. (This data file is described in Table~\ref{sat.dat} on p. % \pageref{sat.dat}.)
\item Several early studies of labor supply were based on an OLS estimation procedure in which the dependent variable was the number of hours worked as a function of the wage and other variables. Is an OLS estimation procedure appropriate in this case? Why or why not?
\item When the SAT exam was first introduced, the mean score on each section was assigned a score of 500; each 100 point difference represented a 1-standard deviation change in the scores in this original sample. In subsequent years, these scores dropped substantially. Does this result necessarily indicate a decline in the average verbal and mathematical abilities of high school seniors? Explain. (Hint: the proportion of high school students increased dramatically during this period.) \item Chiswick (1978) finds that, holding other characteristics constant, immigrants into the United States (after 14 or more years of living in the U.S.) have incomes that are significantly higher than the income of domestically born residents. Does this suggest that any foreign citizen could have earnings in the U.S. that are above those of natural born U.S.
citizens with similar observable characteristics? (Hint: Consider the decision to emigrate.)
\item Consider the relationship between high school GPA and SAT scores captured in Figure~\ref{selectb_g_lim}.
\begin{enumerate}
\item Which relationship provides a better estimate of the SAT scores that would be received by a person randomly selected from the population with known high school GPA?
\item What would happen to the estimated relationship if the expected benefits associated with a college degree increased? Use a diagram to indicate the effect of this change on the estimated regression relationship.
\end{enumerate}
\item
\begin{enumerate}
\item Estimate a Tobit model of grant awards for the female subsample of the data contained in the file \textquotedblleft finaid.dat\textquotedblright\ (this data set is described in Table~\ref{finaid.dat}\ on p.~\pageref% {finaid.dat}.) Use the same independent variables that were used for the male equation described in Table \ref{Tobit_est}. Be sure to select only observations for females when estimating this model.
\item Discuss your results. Were ability variables important?\ Did affirmative action programs appear to affect grant awards during this time period? Do grant awards vary with family background characteristics such as income and family size?
\end{enumerate}
\item
\begin{enumerate}
\item Estimate a Tobit model of work study awards for the male subsample of the data contained in the file \textquotedblleft finaid.dat.\textquotedblright\ Use the same independent variables that were used for the male equation described in Table \ref{Tobit_est} \item Discuss your results. Were ability variables important?\ Did affirmative action programs appear to affect work study awards during this time period? Do work study awards vary with family background characteristics such as income and family size?
\item Repeat this analysis for the female subsample.
\end{enumerate}
\end{enumerate}
\newpage\
\section{Mathematical Appendix}
\subsection{Derivation of \textit{var(u}$_i$\textit{)} in the linear probability model}
Under the linear probability model, for a particular combination of $% X_{1i},X_{2i},\ldots ,X_{ki}$ the residual may take on two values: \begin{equation*}
u_i=1-\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\text{ (when }Y_i=1\text{% )}
\end{equation*}
or:
\begin{equation*}
u_i=-\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\text{ (when }Y_i=0\text{)% }
\end{equation*}
The probability of observing these two outcomes are, respectively: \begin{equation*}
\text{Prob(}Y_i=1)=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki}
\end{equation*}
and
\begin{equation*}
\text{Prob(}Y_i=0)=1-\beta _o-\beta _1X_{1i}-\beta _2X_{2i}-\cdots -\beta _kX_{ki}
\end{equation*}
Thus, the variance of $u_i$ can be defined as: \begin{equation*}
E(u_i^2)=\left( \beta _o+\beta _1X_{1i}+\cdots +\beta _kX_{ki}\right) \left( 1-\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\right) ^2 \end{equation*}
\begin{equation*}
+\left( 1-\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\right) \left( -\beta _o-\beta _1X_{1i}-\cdots -\beta _kX_{ki}\right) ^2 \end{equation*}
To simplify this derivation, define $P_i$ (the probability that $Y_i=1)$ as: \begin{equation*}
P_i=\beta _o+\beta _1X_{1i}+\beta _2X_{2i}+\cdots +\beta _kX_{ki} \end{equation*}
Using this definition, the variance of $u_i$ can be expressed as: \begin{equation*}
E(u_i^2)=P_i(1-P_i)^2+(1-P_i)(-P_i)^2
\end{equation*}
\begin{equation*}
=(1-P_i)\left[ P_i(1-P_i)+P_i^2\right]
\end{equation*}
\begin{equation*}
=(1-P_i)P_i
\end{equation*}
\subsection{Maximum likelihood estimation} The likelihood function for a given sample is the joint density function expressed in terms of the unknown parameter values. In general, the likelihood function can be expressed in the form: \begin{equation}
L=\prod_{i=1}^Nf(X_{1i},X_{2i},\ldots ,X_{ki};\beta _o,\beta _1,\ldots ,\beta _k) \label{log.like.lc}
\end{equation}
\begin{equation*}
\text{where: }\beta _o,\beta _1,\ldots ,\beta _k\text{ are unknown parameters.}
\end{equation*}
In this definition, the symbol “$\prod\limits_{i=1}^N$” is used to denote the product of the $N$ terms to the right of this symbol (indexed by the variable $i$). Each of the terms in this product equals the probability of observing a particular sample outcome. The joint density function is simply the product of these $N$ terms (assuming that the outcomes are independent).
This likelihood function provides a measure of the probability of observing the realized outcomes as a function of the unknown parameter values.
In the case of the probit model discussed in the main body of this chapter, the likelihood function is:
\begin{equation}
L=\prod_{i=1}^m\Phi \left( \beta _o+\beta _1X_{1i}+\cdots +\beta _kX_{ki}\right) \prod_{i=m+1}^N\left( 1-\Phi \left( \beta _o+\beta _1X_{1i}+\cdots +\beta _kX_{ki}\right) \right) \label{likelihood.probit} \end{equation}
In writing this likelihood function, it is assumed that the observations are ordered so that $Y_i=1$ for the first $m$ observations $Y_i=0$ for the remaining $N-m$ observations. The coefficients are normalized by assuming that $\sigma =1$ to simplify the notation.
Maximum likelihood estimation involves the use of a numerical procedure that selects values for the model parameters that maximize this likelihood function. This procedure essentially involves an attempt to find parameter values that are “most likely” given the observed outcomes. In practice, however, the procedure involves the maximization of the natural log of the likelihood function. Since the natural log function provides a monotonic transformation of the original likelihood function, the parameter values that maximize the log of the likelihood function will also maximize the likelihood function.\footnote{%
As noted on page \pageref{monotonic}, a monotonic transformation occurs when an increase in the original variable is always associated with an increase in the transformed value and a decrease in the original variable is always associated with a decrease in the transformed variable.} The log-likelihood function is used in place of the likelihood function for two primary reasons: \begin{itemize}
\item The log transformation converts the product given in equation \ref% {log.like.lc} to a summation (since the log of a product equals the sum of the logs of the individual terms); and
\item The computation of the standard errors for the parameter estimates require the computation of the partial derivatives of the log of the likelihood function.
\end{itemize}
In the case of the probit model, the log-likelihood function is formed by simply taking the natural log of the likelihood function appearing in equation \ref{likelihood.probit}:
\begin{equation}
\ln (L)=\sum_{i=1}^{m}\ln \left[ \Phi \left( \beta _{o}+\beta
_{1}X_{1i}+\cdots +\beta _{k}X_{ki}\right) \right] +\sum_{i=m+1}^{N}\ln % \left[ \left( 1-\Phi \left( \beta _{o}+\beta _{1}X_{1i}+\cdots +\beta _{k}X_{ki}\right) \right) \right] \label{ln.likelihood.probit} \end{equation}
The maximization of the log-likelihood function involves an attempt to find the values of the parameters at which:\footnote{%
A much more detailed discussion of maximum likelihood estimation may be found in Goldfeld and Quandt (1972) or most advanced econometrics texts.} \begin{equation*}
\frac{\partial \ln (L)}{\partial \beta _{o}}=0
\end{equation*}%
\begin{equation*}
\frac{\partial \ln (L)}{\partial \beta _{1}}=0
\end{equation*}%
\begin{equation*}
\frac{\partial \ln (L)}{\partial \beta _{2}}=0
\end{equation*}%
\begin{equation*}
\vdots
\end{equation*}%
\begin{equation*}
\frac{\partial \ln (L)}{\partial \beta _{k}}=0
\end{equation*}%
To find the values of the parameters that satisfy these conditions, a starting set of estimates is initially determined (in the probit or logit models, the linear probability model is often used to provide these starting values). The partial derivatives of the log-likelihood function with respect to each of the parameters are then computed at these initial parameter estimates. If the partial derivative with respect to $\beta _{i}$ is positive, then the log-likelihood function would increase if a larger value of $\beta _{i}$ is selected. In a similar manner, a negative partial derivative for $\beta _{j}$ indicates that the log-likelihood function would increase if the value of $\beta _{j}$ is decreased. Thus, these partial derivatives can be used to determine a new set of parameters that may result in an increase in the value of the log-likelihood function.%
%TCIMACRO{%
%\TeXButton{footnote}{\footnote{Several methods exist for determining the amount by which the parameter %estimates should be changed. For a good, though more advanced, discussion of these %methods, see Goldfeld and Quandt (1972), pp. 1-77 or Chow (1983), pp. 232-235.}} }% %BeginExpansion
\footnote{Several methods exist for determining the amount by which the parameter estimates should be changed. For a good, though more advanced, discussion of these methods, see Goldfeld and Quandt (1972), pp. 1-77 or Chow (1983), pp. 232-235.} %EndExpansion
This process continues until the estimators converge to a set of parameter estimates at which the log-likelihood function is maximized.