Panel Model

$y_i: T\times 1$ , $Y_{nT\times 1}=(y_1',...,y_n')'$ ,
$x_i : T_i\times k$ , $X_{nT\times k}=(x_1',...,x_n')'$

\begin{aligned} &y_{i t}=\boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+e_{i t}\\ &Y=X\beta+e\\ &e_{i t}=\alpha_{i}+\varepsilon_{i t} \text{ individual effect+idiosyncratic error} \end{aligned}

1. Random Effects-GLS#

$y_{i t}=\boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+\alpha_i+\varepsilon_{i t}$ RE-Identification

\begin{aligned} \mathbb{E}\left(\varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\varepsilon_{i t}^{2} \mid \mathbf{X}_{i}\right) &=\sigma_{\varepsilon}^{2} \\ \mathbb{E}\left(\varepsilon_{i j} \varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\alpha_{i} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\alpha_{i}^{2} \mid \mathbf{X}_{i}\right) &=\sigma_{\alpha}^{2} \\ \mathbb{E}\left(\alpha_{i} \varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \end{aligned}

For that heteroskedasticity,

\begin{aligned} \mathbb{E}\left(\boldsymbol{e}_{i} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\boldsymbol{e}_{i} \boldsymbol{e}_{i}^{\prime} \mid \mathbf{X}_{i}\right) &=\mathbf{1}_{i} \mathbf{1}_{i}^{\prime} \sigma_{u}^{2}+\boldsymbol{I}_{i} \sigma_{\varepsilon}^{2}=\sigma_{\varepsilon}^{2} \boldsymbol{\Omega}_{i}. \end{aligned}

We use GLS to obtain BLUE (Gauss-Markov conditions).

\widehat{\boldsymbol{\beta}}_{\mathrm{gls}}=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{\Omega}_{i}^{-1} \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{\Omega}_{i}^{-1} \boldsymbol{y}_{i}\right)

Feasible We use still consistent OLS for the residuals $\hat{e}$ and estimate the covariance matrix.

\widehat{\boldsymbol{e}}_{i}=\boldsymbol{y}_{i}-\mathbf{X}_{i} \widehat{\boldsymbol{\beta}}_{\mathrm{OLS}}

2. Fixed Effect#

In the econometrics literature, if the stochastic structure of $\alpha_i$ is treated as unknown and possibly correlated with $x_{it}$ then $\alpha_i$ is called a fixed effect.

FE-Identification

$E(\varepsilon_{it} | x_{i},\alpha_i)=0$ for all $t$
$E(\alpha_i |x_i)\neq 0$ check by ==Hausman-Wu test==. otherwise use random effects to achieve more efficiency. $\begin{aligned} H &=\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{\prime} \widehat{\operatorname{var}}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{-1}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right) \\ &=\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{\prime}\left(\widehat{\boldsymbol{V}}_{\mathrm{fe}}-\widehat{\boldsymbol{V}}_{\mathrm{re}}\right)^{-1}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right) \end{aligned}$

Within-transformation $M Y=M X\beta+M\alpha +M \varepsilon$

\begin{aligned} \widehat{\boldsymbol{\beta}}_{\mathrm{fe}}&=\left(\sum_{i=1}^{N} \sum_{t \in S_{i}} \dot{\boldsymbol{x}}_{i t} \dot{\boldsymbol{x}}_{i t}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \sum_{t \in S_{i}} \dot{\boldsymbol{x}}_{i t} \dot{y}_{i t}\right) \\ &=\left(\sum_{i=1}^{N} \dot{\boldsymbol{X}}_{i}^{\prime} \dot{\boldsymbol{X}}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \dot{\boldsymbol{X}}_{i}^{\prime} \dot{\boldsymbol{y}}_{i}\right) \\ &=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \boldsymbol{y}_{i}\right) \end{aligned}

Dummy Regression#

Take $\alpha_i$ as coefficient of a dummy variable:

\begin{aligned} &d_i=(d_{i1},\cdots,d_{ii},\cdots,d_{in})'=(0,\cdots,1,\cdots,0)'\\ &\alpha=(\alpha_{1},\cdots,\alpha_{i},\cdots,\alpha_{n})' \end{aligned}

$y_{it}=x_{it}'\beta +d_i'\alpha +\varepsilon_{it}$ Then by OLS and Frisch-Waugh Theorem, the $\hat{\beta}$ is same as fixed effect estimator.

RE v.s. FE#

RE is a linear combination of between estimator $\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{P}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{P}_{i} \boldsymbol{y}_{i}\right)$ and fixed effect estimator $\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \boldsymbol{y}_{i}\right)$ .

\bar{y}_{i}=\overline{\boldsymbol{x}}_{i}^{\prime} \boldsymbol{\beta}+\alpha_{i}+\bar{\varepsilon}_{i}, \quad \widehat{\boldsymbol{\beta}}_{\mathrm{be}}=\left(\sum_{i=1}^{N} \overline{\boldsymbol{x}}_{i} \overline{\boldsymbol{x}}_{i}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \overline{\boldsymbol{x}}_{i} \bar{y}_{i}\right)

$T\rightarrow \infty$ , RE converges to FE
$Var(\hat{\beta}_{RE})\leq Var(\hat{\beta}_{FE})$

3. First-difference#

Another way to eliminate the individual effect.

\Delta y_{i t}=\Delta \boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+\Delta \varepsilon_{i t}

\begin{aligned} \widehat{\boldsymbol{\beta}}_{\Delta} &=\left(\sum_{i=1}^{N} \sum_{t \geq 2} \Delta \boldsymbol{x}_{i t} \Delta \boldsymbol{x}_{i t}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \sum_{t \geq 2} \Delta \boldsymbol{x}_{i t} \Delta y_{i t}\right) \\ &=\left(\sum_{i=1}^{N} \Delta \boldsymbol{X}_{i}^{\prime} \Delta \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \Delta \boldsymbol{X}_{i}^{\prime} \Delta \boldsymbol{y}_{i}\right) \\ &=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{D}_{i}^{\prime} \boldsymbol{D}_{i} \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{D}_{i}^{\prime} \boldsymbol{D}_{i} \boldsymbol{y}_{i}\right) \end{aligned}

T=2, equal to fixed effect estimator
T>2, not.

4. Dynamic Panel#

Weaker identification assumption than previous FE/RE (sequential exogeneity): $x_{it}=y_{it-1}$

$y_{it}=y_{it-1}\beta +\alpha_i + \varepsilon_{it}$ $E(\varepsilon_{it}|x_{it},\cdots,x_{i1}, \alpha_i)=0$

In applications it will often be useful to include time effects $f_t$ to eliminate spurious serial correlation.

Inconsistency FE#

The within operator induces correlation between the AR(1) lag and the error. The result is that the within estimator is inconsistent for the coefficients when $T$ is fixed. A thorough explanation appears in Nickell (1981)-incidental parameter problem. $\hat{\beta}_{FE} \xrightarrow{p} \beta + (...)E(x_{it}\varepsilon_{it}-\bar{x}_{i} \varepsilon_{it})=\beta+B^{-1}O(1/\sqrt{T})$

Solutions

Let $T\rightarrow \infty$
Anderson-Hsiao Estimator (just-identification)
Arellano-Bond Estimator (over-identification)

Anderson-Hsiao Estimator#

Anderson and Hsiao (1982) made an important breakthrough by showing that a simple instrumental variables estimator is consistent for the parameters.

first-differencing to eliminate fixed effects: $\mathbb{E}\left(\Delta y_{i t-1} \Delta \varepsilon_{i t}\right)=\mathbb{E}\left(\left(y_{i t}-y_{i t-1}\right)\left(\varepsilon_{i t}-\varepsilon_{i t-1}\right)\right)=-\sigma_{\varepsilon}^{2}$
IV for the endogeniety problem above. Using $y_{it-2}$ for $\Delta y_{it-1}$ : $\left(y_{i t-2}, \ldots, y_{i t-p-1}\right) \text { for }\left(\Delta y_{i t-1}, \ldots, \Delta y_{i t-p}\right)$
Given assumption on non-serial correlation of $\varepsilon_{it}$ , as $N\rightarrow \infty$ , $\widehat{\beta}_{i v} \stackrel{p}{\longrightarrow} \beta-\frac{\mathbb{E}\left(y_{i 1} \Delta \varepsilon_{i 3}\right)}{\mathbb{E}\left(y_{i 1} \Delta y_{i 2}\right)}=\beta$

Arellano-Bond Estimator#

Use $(y_{it-2},y_{it-3},\cdots)$ as IV for $\Delta y_{it-1}$ . Apply more valid IV to increase the ==efficiency== (smaller asymptotic variance).

Using these extra instruments has a complication that there are a di§erent number of instruments for each time period. The solution is to view the model as a system of T equations.

Weak IV issue#

The Anderson-Hsiao instrument is weak if the $\gamma$ is small Blundell and Bond (1998)

\gamma=(\beta-1)\left(\frac{k}{k+\sigma_{\alpha}^{2} / \sigma_{\varepsilon}^{2}}\right), \quad k=\frac{1-\beta}{1+\beta}

unit-root $\beta=1$
the idiosyncratic effect $\varepsilon$ is small relative to the individual-specific effect $\alpha$

Arellano and Bover (1995) and Blundell and Bond (1998) introduced a set of ==orthogonality conditions== which reduce the weak instrument problem.

5. Probit with fixed effects#

$y_{it}^{*} =x_{it}'\beta+\alpha_i+\varepsilon$ log-Likelihood function $l(\beta,\alpha)=\sum_i\sum_t y_{it}log\Phi(x'\beta+\alpha_i)+(1-y_{it})log[1-\Phi(x'\beta+\alpha_i)]$

we cannote difference out the fixed effects from the likelihood function, so estimation on $\beta$ and $\alpha_i$ are required.

2-step

$\hat{\alpha}_i(\beta) = \arg\min_{\alpha} l(\beta,\alpha)$
$\hat{\beta}=\arg\min_{\beta} l(\beta,\hat{\alpha})$

incidental parameter problem#

$\hat{\alpha}$ only use $T$ obs, non-consistent as $N\rightarrow \infty$ .
the fixed T in $\hat{\alpha}$ contaminate the $\hat{\beta}$ to be inconsistent. $\hat{\beta}_{mle} = \beta + \frac{\beta}{T}+O_p(1/T^2).$
when $T\rightarrow \infty$ , require $\frac{N}{T}\rightarrow \lambda$ that $\sqrt{NT^2}(\hat{\beta}_{mle}-\beta)=\sqrt{\frac{N}{T}}\beta +o_p(1)$

6. Logit with fixed effects#

tip

similar to probit case.

7. Multinomial response model#

$y_{ij}^{*} =x_{ij}'\beta+z_i\gamma_j+a_{ij}=observed+ \text{choice coefficients} + \text{unobserved factors}$

non-ordinal choice
ordinal: bond rating
Let $V_{ij}=x_{ij}'\beta+z_i\gamma_j$ (mixed logit model)

unordered choice#

We model the choice behavior using ==utility maximization== argument. McFadden (1973). $y_i=\arg\max_{j} \{y_{i1}^{*},\cdots,y_{iJ}^{*}\}$ Assume $a_{ij}\sim F(a)=e^{-e^{-a}}$ , ==Type 1 extreme value distribution==. $f(a)=e^{-a-e^{-a}}$ Assume $a_{ij}$ independent of $x,z$ .

\begin{aligned} P(y_i = j|x,z)&=\frac{e^{V_{ij}}}{\sum_{k=1}^J e^{V_{ik}}}\\ &=\frac{e^{x_{ij}\beta+z_i'\gamma_j}}{\sum_{k=1}^J e^{x_{ik}\beta+z_i'\gamma_k}} \end{aligned}

Then by MLE.

limitations

IIA: choice between two is independent of irrelevant alternatives. arise from iid $a_{ij}$ across $1,...,J$ . $\frac{P(y_i=j|x)}{P(y_i=k|x)}=e^{(x_{ij}-x_{ik})'\beta}$

8. Interactive effects#

Y=\sum_{k=1}^{K} \beta_{k}^{0} X_{k}+\varepsilon, \quad \varepsilon=\lambda^{0} f^{0 \prime}+e

\widehat{\boldsymbol{\beta}}_{R}=\underset{\beta \in \mathbb{R}^{K}}{\operatorname{argmin}} \mathcal{L}_{N T}^{R}(\beta)

\begin{aligned} \mathcal{L}_{N T}^{R}(\beta) &=\min _{\left\{\Lambda \in \mathbb{R}^{N \times R}, F \in \mathbb{R}^{T \times R}\right\}} \frac{1}{N T}\left\|Y-\beta \cdot X-\Lambda F^{\prime}\right\|_{\mathrm{HS}}^{2} \\ &=\min _{F \in \mathbb{R}^{T \times R}} \frac{1}{N T} \operatorname{Tr}\left[(Y-\beta \cdot X)^{\prime}(Y-\beta \cdot X) M_{F}\right] \\ &=\frac{1}{N T} \sum_{r=R+1}^{T} \mu_{r}\left[(Y-\beta \cdot X)^{\prime}(Y-\beta \cdot X)\right], \end{aligned}

The resulting optimization problem for F is a principal components problem, so that the optimal F is given by the R largest principal components. At the optimum, the projector $M_F$ therefore exactly projects out the R largest eigenvalues of this matrix, which gives rise to the final formulation of the profile objective function as the sum over its (T − R) smallest eigenvalues.

strong factor and strong iv#

ASSUMPTION SF-Strong Factor Assumption: (i) We have $0<\operatorname{plim}_{N, T \rightarrow \infty} \frac{1}{N} \lambda^{0 \prime} \lambda^{0}<\infty$ . (ii) We have $0<\operatorname{plim}_{N, T \rightarrow \infty} \frac{1}{T} f^{0 \prime} f^{0}<\infty$ .

9. Suprious Regression#

In time-series suprious regression, the suprious regression gives bad result as the estimator is converging to a random variable not a constant.

\begin{aligned} &x_{i t}=x_{i, t-1}+\varepsilon_{i t} \\ &y_{i t}=y_{i, t-1}+e_{i t} \end{aligned}

In panel model:

\begin{gathered} y_{i t}=\alpha+x_{i t} \beta+u_{i t} \\ u_{i t}=\mu_{i}+\nu_{i t} \end{gathered}

Kao (1999) shown by sequential asymptotics:

\sqrt{n} \widehat{\beta}_{F E} \stackrel{d}{\rightarrow} N\left(0, \frac{2 \sigma_{e}^{2}}{5 \sigma_{\varepsilon}^{2}}\right)

In panel, the estimator in the suprious regression gives good inference as it converges to a zero mean distribution.