y i : T × 1 y_i: T\times 1 y i : T × 1 , Y n T × 1 = ( y 1 ′ , . . . , y n ′ ) ′ Y_{nT\times 1}=(y_1',...,y_n')' Y n T × 1 = ( y 1 ′ , ... , y n ′ ) ′ ,x i : T i × k x_i : T_i\times k x i : T i × k , X n T × k = ( x 1 ′ , . . . , x n ′ ) ′ X_{nT\times k}=(x_1',...,x_n')' X n T × k = ( x 1 ′ , ... , x n ′ ) ′ y i t = x i t ′ β + e i t Y = X β + e e i t = α i + ε i t individual effect+idiosyncratic error \begin{aligned} &y_{i t}=\boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+e_{i t}\\ &Y=X\beta+e\\ &e_{i t}=\alpha_{i}+\varepsilon_{i t} \text{ individual effect+idiosyncratic error} \end{aligned} y i t = x i t ′ β + e i t Y = Xβ + e e i t = α i + ε i t individual effect+idiosyncratic error 1. Random Effects-GLS# y i t = x i t ′ β + α i + ε i t y_{i t}=\boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+\alpha_i+\varepsilon_{i t} y i t = x i t ′ β + α i + ε i t
RE-Identification
E ( ε i t ∣ X i ) = 0 E ( ε i t 2 ∣ X i ) = σ ε 2 E ( ε i j ε i t ∣ X i ) = 0 E ( α i ∣ X i ) = 0 E ( α i 2 ∣ X i ) = σ α 2 E ( α i ε i t ∣ X i ) = 0 \begin{aligned} \mathbb{E}\left(\varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\varepsilon_{i t}^{2} \mid \mathbf{X}_{i}\right) &=\sigma_{\varepsilon}^{2} \\ \mathbb{E}\left(\varepsilon_{i j} \varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\alpha_{i} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\alpha_{i}^{2} \mid \mathbf{X}_{i}\right) &=\sigma_{\alpha}^{2} \\ \mathbb{E}\left(\alpha_{i} \varepsilon_{i t} \mid \mathbf{X}_{i}\right) &=0 \end{aligned} E ( ε i t ∣ X i ) E ( ε i t 2 ∣ X i ) E ( ε ij ε i t ∣ X i ) E ( α i ∣ X i ) E ( α i 2 ∣ X i ) E ( α i ε i t ∣ X i ) = 0 = σ ε 2 = 0 = 0 = σ α 2 = 0 For that heteroskedasticity,
E ( e i ∣ X i ) = 0 E ( e i e i ′ ∣ X i ) = 1 i 1 i ′ σ u 2 + I i σ ε 2 = σ ε 2 Ω i . \begin{aligned} \mathbb{E}\left(\boldsymbol{e}_{i} \mid \mathbf{X}_{i}\right) &=0 \\ \mathbb{E}\left(\boldsymbol{e}_{i} \boldsymbol{e}_{i}^{\prime} \mid \mathbf{X}_{i}\right) &=\mathbf{1}_{i} \mathbf{1}_{i}^{\prime} \sigma_{u}^{2}+\boldsymbol{I}_{i} \sigma_{\varepsilon}^{2}=\sigma_{\varepsilon}^{2} \boldsymbol{\Omega}_{i}. \end{aligned} E ( e i ∣ X i ) E ( e i e i ′ ∣ X i ) = 0 = 1 i 1 i ′ σ u 2 + I i σ ε 2 = σ ε 2 Ω i . We use GLS to obtain BLUE (Gauss-Markov conditions).
β ^ g l s = ( ∑ i = 1 N X i ′ Ω i − 1 X i ) − 1 ( ∑ i = 1 N X i ′ Ω i − 1 y i ) \widehat{\boldsymbol{\beta}}_{\mathrm{gls}}=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{\Omega}_{i}^{-1} \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{\Omega}_{i}^{-1} \boldsymbol{y}_{i}\right) β gls = ( i = 1 ∑ N X i ′ Ω i − 1 X i ) − 1 ( i = 1 ∑ N X i ′ Ω i − 1 y i ) Feasible
We use still consistent OLS for the residuals e ^ \hat{e} e ^ and estimate the covariance matrix.
e ^ i = y i − X i β ^ O L S \widehat{\boldsymbol{e}}_{i}=\boldsymbol{y}_{i}-\mathbf{X}_{i} \widehat{\boldsymbol{\beta}}_{\mathrm{OLS}} e i = y i − X i β OLS 2. Fixed Effect# In the econometrics literature, if the stochastic structure of α i \alpha_i α i is treated as unknown and possibly correlated with x i t x_{it} x i t then α i \alpha_i α i is called a fixed effect.
FE-Identification
E ( ε i t ∣ x i , α i ) = 0 E(\varepsilon_{it} | x_{i},\alpha_i)=0 E ( ε i t ∣ x i , α i ) = 0 for all t t t E ( α i ∣ x i ) ≠ 0 E(\alpha_i |x_i)\neq 0 E ( α i ∣ x i ) = 0 check by ==Hausman-Wu test==. otherwise use random effects to achieve more efficiency. H = ( β ^ f e − β ^ r e ) ′ var ^ ( β ^ f e − β ^ r e ) − 1 ( β ^ f e − β ^ r e ) = ( β ^ f e − β ^ r e ) ′ ( V ^ f e − V ^ r e ) − 1 ( β ^ f e − β ^ r e ) \begin{aligned} H &=\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{\prime} \widehat{\operatorname{var}}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{-1}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right) \\ &=\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right)^{\prime}\left(\widehat{\boldsymbol{V}}_{\mathrm{fe}}-\widehat{\boldsymbol{V}}_{\mathrm{re}}\right)^{-1}\left(\widehat{\boldsymbol{\beta}}_{\mathrm{fe}}-\widehat{\boldsymbol{\beta}}_{\mathrm{re}}\right) \end{aligned} H = ( β fe − β re ) ′ var ( β fe − β re ) − 1 ( β fe − β re ) = ( β fe − β re ) ′ ( V fe − V re ) − 1 ( β fe − β re ) Within-transformation
M Y = M X β + M α + M ε M Y=M X\beta+M\alpha +M \varepsilon M Y = MXβ + M α + Mε
β ^ f e = ( ∑ i = 1 N ∑ t ∈ S i x ˙ i t x ˙ i t ′ ) − 1 ( ∑ i = 1 N ∑ t ∈ S i x ˙ i t y ˙ i t ) = ( ∑ i = 1 N X ˙ i ′ X ˙ i ) − 1 ( ∑ i = 1 N X ˙ i ′ y ˙ i ) = ( ∑ i = 1 N X i ′ M i X i ) − 1 ( ∑ i = 1 N X i ′ M i y i ) \begin{aligned} \widehat{\boldsymbol{\beta}}_{\mathrm{fe}}&=\left(\sum_{i=1}^{N} \sum_{t \in S_{i}} \dot{\boldsymbol{x}}_{i t} \dot{\boldsymbol{x}}_{i t}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \sum_{t \in S_{i}} \dot{\boldsymbol{x}}_{i t} \dot{y}_{i t}\right) \\ &=\left(\sum_{i=1}^{N} \dot{\boldsymbol{X}}_{i}^{\prime} \dot{\boldsymbol{X}}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \dot{\boldsymbol{X}}_{i}^{\prime} \dot{\boldsymbol{y}}_{i}\right) \\ &=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \boldsymbol{y}_{i}\right) \end{aligned} β fe = ( i = 1 ∑ N t ∈ S i ∑ x ˙ i t x ˙ i t ′ ) − 1 ( i = 1 ∑ N t ∈ S i ∑ x ˙ i t y ˙ i t ) = ( i = 1 ∑ N X ˙ i ′ X ˙ i ) − 1 ( i = 1 ∑ N X ˙ i ′ y ˙ i ) = ( i = 1 ∑ N X i ′ M i X i ) − 1 ( i = 1 ∑ N X i ′ M i y i ) Dummy Regression# Take α i \alpha_i α i as coefficient of a dummy variable:
d i = ( d i 1 , ⋯ , d i i , ⋯ , d i n ) ′ = ( 0 , ⋯ , 1 , ⋯ , 0 ) ′ α = ( α 1 , ⋯ , α i , ⋯ , α n ) ′ \begin{aligned} &d_i=(d_{i1},\cdots,d_{ii},\cdots,d_{in})'=(0,\cdots,1,\cdots,0)'\\ &\alpha=(\alpha_{1},\cdots,\alpha_{i},\cdots,\alpha_{n})' \end{aligned} d i = ( d i 1 , ⋯ , d ii , ⋯ , d in ) ′ = ( 0 , ⋯ , 1 , ⋯ , 0 ) ′ α = ( α 1 , ⋯ , α i , ⋯ , α n ) ′ y i t = x i t ′ β + d i ′ α + ε i t y_{it}=x_{it}'\beta +d_i'\alpha +\varepsilon_{it} y i t = x i t ′ β + d i ′ α + ε i t
Then by OLS and Frisch-Waugh Theorem, the β ^ \hat{\beta} β ^ is same as fixed effect estimator.
RE v.s. FE# RE is a linear combination of between estimator ( ∑ i = 1 N X i ′ P i X i ) − 1 ( ∑ i = 1 N X i ′ P i y i ) \left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{P}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{P}_{i} \boldsymbol{y}_{i}\right) ( ∑ i = 1 N X i ′ P i X i ) − 1 ( ∑ i = 1 N X i ′ P i y i ) and fixed effect estimator ( ∑ i = 1 N X i ′ M i X i ) − 1 ( ∑ i = 1 N X i ′ M i y i ) \left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \mathbf{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{M}_{i} \boldsymbol{y}_{i}\right) ( ∑ i = 1 N X i ′ M i X i ) − 1 ( ∑ i = 1 N X i ′ M i y i ) .
y ˉ i = x ‾ i ′ β + α i + ε ˉ i , β ^ b e = ( ∑ i = 1 N x ‾ i x ‾ i ′ ) − 1 ( ∑ i = 1 N x ‾ i y ˉ i ) \bar{y}_{i}=\overline{\boldsymbol{x}}_{i}^{\prime} \boldsymbol{\beta}+\alpha_{i}+\bar{\varepsilon}_{i}, \quad \widehat{\boldsymbol{\beta}}_{\mathrm{be}}=\left(\sum_{i=1}^{N} \overline{\boldsymbol{x}}_{i} \overline{\boldsymbol{x}}_{i}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \overline{\boldsymbol{x}}_{i} \bar{y}_{i}\right) y ˉ i = x i ′ β + α i + ε ˉ i , β be = ( i = 1 ∑ N x i x i ′ ) − 1 ( i = 1 ∑ N x i y ˉ i ) T → ∞ T\rightarrow \infty T → ∞ , RE converges to FEV a r ( β ^ R E ) ≤ V a r ( β ^ F E ) Var(\hat{\beta}_{RE})\leq Var(\hat{\beta}_{FE}) Va r ( β ^ RE ) ≤ Va r ( β ^ FE ) 3. First-difference# Another way to eliminate the individual effect.
Δ y i t = Δ x i t ′ β + Δ ε i t \Delta y_{i t}=\Delta \boldsymbol{x}_{i t}^{\prime} \boldsymbol{\beta}+\Delta \varepsilon_{i t} Δ y i t = Δ x i t ′ β + Δ ε i t β ^ Δ = ( ∑ i = 1 N ∑ t ≥ 2 Δ x i t Δ x i t ′ ) − 1 ( ∑ i = 1 N ∑ t ≥ 2 Δ x i t Δ y i t ) = ( ∑ i = 1 N Δ X i ′ Δ X i ) − 1 ( ∑ i = 1 N Δ X i ′ Δ y i ) = ( ∑ i = 1 N X i ′ D i ′ D i X i ) − 1 ( ∑ i = 1 N X i ′ D i ′ D i y i ) \begin{aligned} \widehat{\boldsymbol{\beta}}_{\Delta} &=\left(\sum_{i=1}^{N} \sum_{t \geq 2} \Delta \boldsymbol{x}_{i t} \Delta \boldsymbol{x}_{i t}^{\prime}\right)^{-1}\left(\sum_{i=1}^{N} \sum_{t \geq 2} \Delta \boldsymbol{x}_{i t} \Delta y_{i t}\right) \\ &=\left(\sum_{i=1}^{N} \Delta \boldsymbol{X}_{i}^{\prime} \Delta \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \Delta \boldsymbol{X}_{i}^{\prime} \Delta \boldsymbol{y}_{i}\right) \\ &=\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \mathbf{D}_{i}^{\prime} \boldsymbol{D}_{i} \boldsymbol{X}_{i}\right)^{-1}\left(\sum_{i=1}^{N} \boldsymbol{X}_{i}^{\prime} \boldsymbol{D}_{i}^{\prime} \boldsymbol{D}_{i} \boldsymbol{y}_{i}\right) \end{aligned} β Δ = ( i = 1 ∑ N t ≥ 2 ∑ Δ x i t Δ x i t ′ ) − 1 ( i = 1 ∑ N t ≥ 2 ∑ Δ x i t Δ y i t ) = ( i = 1 ∑ N Δ X i ′ Δ X i ) − 1 ( i = 1 ∑ N Δ X i ′ Δ y i ) = ( i = 1 ∑ N X i ′ D i ′ D i X i ) − 1 ( i = 1 ∑ N X i ′ D i ′ D i y i ) T=2, equal to fixed effect estimator T>2, not. 4. Dynamic Panel# Weaker identification assumption than previous FE/RE (sequential exogeneity): x i t = y i t − 1 x_{it}=y_{it-1} x i t = y i t − 1
y i t = y i t − 1 β + α i + ε i t y_{it}=y_{it-1}\beta +\alpha_i + \varepsilon_{it} y i t = y i t − 1 β + α i + ε i t
E ( ε i t ∣ x i t , ⋯ , x i 1 , α i ) = 0 E(\varepsilon_{it}|x_{it},\cdots,x_{i1}, \alpha_i)=0 E ( ε i t ∣ x i t , ⋯ , x i 1 , α i ) = 0
In applications it will often be useful to include time effects f t f_t f t to eliminate spurious serial correlation.
Inconsistency FE# The within operator induces correlation between the AR(1) lag and the error. The result is that the within estimator is inconsistent for the coefficients when T T T is fixed. A thorough explanation appears in Nickell (1981) -incidental parameter problem .
β ^ F E → p β + ( . . . ) E ( x i t ε i t − x ˉ i ε i t ) = β + B − 1 O ( 1 / T ) \hat{\beta}_{FE} \xrightarrow{p} \beta + (...)E(x_{it}\varepsilon_{it}-\bar{x}_{i} \varepsilon_{it})=\beta+B^{-1}O(1/\sqrt{T}) β ^ FE p β + ( ... ) E ( x i t ε i t − x ˉ i ε i t ) = β + B − 1 O ( 1/ T )
Solutions
Let T → ∞ T\rightarrow \infty T → ∞ Anderson-Hsiao Estimator (just-identification) Arellano-Bond Estimator (over-identification) Anderson-Hsiao Estimator# Anderson and Hsiao (1982) made an important breakthrough by showing that a simple instrumental variables estimator is consistent for the parameters.
first-differencing to eliminate fixed effects: E ( Δ y i t − 1 Δ ε i t ) = E ( ( y i t − y i t − 1 ) ( ε i t − ε i t − 1 ) ) = − σ ε 2 \mathbb{E}\left(\Delta y_{i t-1} \Delta \varepsilon_{i t}\right)=\mathbb{E}\left(\left(y_{i t}-y_{i t-1}\right)\left(\varepsilon_{i t}-\varepsilon_{i t-1}\right)\right)=-\sigma_{\varepsilon}^{2} E ( Δ y i t − 1 Δ ε i t ) = E ( ( y i t − y i t − 1 ) ( ε i t − ε i t − 1 ) ) = − σ ε 2 IV for the endogeniety problem above. Using y i t − 2 y_{it-2} y i t − 2 for Δ y i t − 1 \Delta y_{it-1} Δ y i t − 1 :( y i t − 2 , … , y i t − p − 1 ) for ( Δ y i t − 1 , … , Δ y i t − p ) \left(y_{i t-2}, \ldots, y_{i t-p-1}\right) \text { for }\left(\Delta y_{i t-1}, \ldots, \Delta y_{i t-p}\right) ( y i t − 2 , … , y i t − p − 1 ) for ( Δ y i t − 1 , … , Δ y i t − p ) Given assumption on non-serial correlation of ε i t \varepsilon_{it} ε i t , as N → ∞ N\rightarrow \infty N → ∞ ,β ^ i v ⟶ p β − E ( y i 1 Δ ε i 3 ) E ( y i 1 Δ y i 2 ) = β \widehat{\beta}_{i v} \stackrel{p}{\longrightarrow} \beta-\frac{\mathbb{E}\left(y_{i 1} \Delta \varepsilon_{i 3}\right)}{\mathbb{E}\left(y_{i 1} \Delta y_{i 2}\right)}=\beta β i v ⟶ p β − E ( y i 1 Δ y i 2 ) E ( y i 1 Δ ε i 3 ) = β Arellano-Bond Estimator# Use ( y i t − 2 , y i t − 3 , ⋯ ) (y_{it-2},y_{it-3},\cdots) ( y i t − 2 , y i t − 3 , ⋯ ) as IV for Δ y i t − 1 \Delta y_{it-1} Δ y i t − 1 . Apply more valid IV to increase the ==efficiency== (smaller asymptotic variance).
Using these extra instruments has a complication that there are a di§erent number of instruments for each time period. The solution is to view the model as a system of T equations.
Weak IV issue# The Anderson-Hsiao instrument is weak if the γ \gamma γ is small Blundell and Bond (1998)
γ = ( β − 1 ) ( k k + σ α 2 / σ ε 2 ) , k = 1 − β 1 + β \gamma=(\beta-1)\left(\frac{k}{k+\sigma_{\alpha}^{2} / \sigma_{\varepsilon}^{2}}\right), \quad k=\frac{1-\beta}{1+\beta} γ = ( β − 1 ) ( k + σ α 2 / σ ε 2 k ) , k = 1 + β 1 − β unit-root β = 1 \beta=1 β = 1 the idiosyncratic effect ε \varepsilon ε is small relative to the individual-specific effect α \alpha α Arellano and Bover (1995) and Blundell and Bond (1998) introduced a set of ==orthogonality conditions== which reduce the weak instrument problem.
5. Probit with fixed effects# y i t ∗ = x i t ′ β + α i + ε y_{it}^{*} =x_{it}'\beta+\alpha_i+\varepsilon y i t ∗ = x i t ′ β + α i + ε
log-Likelihood function
l ( β , α ) = ∑ i ∑ t y i t l o g Φ ( x ′ β + α i ) + ( 1 − y i t ) l o g [ 1 − Φ ( x ′ β + α i ) ] l(\beta,\alpha)=\sum_i\sum_t y_{it}log\Phi(x'\beta+\alpha_i)+(1-y_{it})log[1-\Phi(x'\beta+\alpha_i)] l ( β , α ) = ∑ i ∑ t y i t l o g Φ ( x ′ β + α i ) + ( 1 − y i t ) l o g [ 1 − Φ ( x ′ β + α i )]
we cannote difference out the fixed effects from the likelihood function, so estimation on β \beta β and α i \alpha_i α i are required.
2-step
α ^ i ( β ) = arg min α l ( β , α ) \hat{\alpha}_i(\beta) = \arg\min_{\alpha} l(\beta,\alpha) α ^ i ( β ) = arg min α l ( β , α ) β ^ = arg min β l ( β , α ^ ) \hat{\beta}=\arg\min_{\beta} l(\beta,\hat{\alpha}) β ^ = arg min β l ( β , α ^ ) incidental parameter problem# α ^ \hat{\alpha} α ^ only use T T T obs, non-consistent as N → ∞ N\rightarrow \infty N → ∞ .the fixed T in α ^ \hat{\alpha} α ^ contaminate the β ^ \hat{\beta} β ^ to be inconsistent. β ^ m l e = β + β T + O p ( 1 / T 2 ) . \hat{\beta}_{mle} = \beta + \frac{\beta}{T}+O_p(1/T^2). β ^ m l e = β + T β + O p ( 1/ T 2 ) . when T → ∞ T\rightarrow \infty T → ∞ , require N T → λ \frac{N}{T}\rightarrow \lambda T N → λ that N T 2 ( β ^ m l e − β ) = N T β + o p ( 1 ) \sqrt{NT^2}(\hat{\beta}_{mle}-\beta)=\sqrt{\frac{N}{T}}\beta +o_p(1) N T 2 ( β ^ m l e − β ) = T N β + o p ( 1 ) 6. Logit with fixed effects# 7. Multinomial response model# y i j ∗ = x i j ′ β + z i γ j + a i j = o b s e r v e d + choice coefficients + unobserved factors y_{ij}^{*} =x_{ij}'\beta+z_i\gamma_j+a_{ij}=observed+ \text{choice coefficients} + \text{unobserved factors} y ij ∗ = x ij ′ β + z i γ j + a ij = o b ser v e d + choice coefficients + unobserved factors
non-ordinal choice ordinal: bond rating Let V i j = x i j ′ β + z i γ j V_{ij}=x_{ij}'\beta+z_i\gamma_j V ij = x ij ′ β + z i γ j (mixed logit model) unordered choice# We model the choice behavior using ==utility maximization== argument. McFadden (1973).
y i = arg max j { y i 1 ∗ , ⋯ , y i J ∗ } y_i=\arg\max_{j} \{y_{i1}^{*},\cdots,y_{iJ}^{*}\} y i = arg max j { y i 1 ∗ , ⋯ , y i J ∗ }
Assume a i j ∼ F ( a ) = e − e − a a_{ij}\sim F(a)=e^{-e^{-a}} a ij ∼ F ( a ) = e − e − a , ==Type 1 extreme value distribution==.
f ( a ) = e − a − e − a f(a)=e^{-a-e^{-a}} f ( a ) = e − a − e − a
Assume a i j a_{ij} a ij independent of x , z x,z x , z .
P ( y i = j ∣ x , z ) = e V i j ∑ k = 1 J e V i k = e x i j β + z i ′ γ j ∑ k = 1 J e x i k β + z i ′ γ k \begin{aligned} P(y_i = j|x,z)&=\frac{e^{V_{ij}}}{\sum_{k=1}^J e^{V_{ik}}}\\ &=\frac{e^{x_{ij}\beta+z_i'\gamma_j}}{\sum_{k=1}^J e^{x_{ik}\beta+z_i'\gamma_k}} \end{aligned} P ( y i = j ∣ x , z ) = ∑ k = 1 J e V ik e V ij = ∑ k = 1 J e x ik β + z i ′ γ k e x ij β + z i ′ γ j Then by MLE.
limitations
IIA: choice between two is independent of irrelevant alternatives. arise from iid a i j a_{ij} a ij across 1 , . . . , J 1,...,J 1 , ... , J .
P ( y i = j ∣ x ) P ( y i = k ∣ x ) = e ( x i j − x i k ) ′ β \frac{P(y_i=j|x)}{P(y_i=k|x)}=e^{(x_{ij}-x_{ik})'\beta} P ( y i = k ∣ x ) P ( y i = j ∣ x ) = e ( x ij − x ik ) ′ β 8. Interactive effects# Y = ∑ k = 1 K β k 0 X k + ε , ε = λ 0 f 0 ′ + e Y=\sum_{k=1}^{K} \beta_{k}^{0} X_{k}+\varepsilon, \quad \varepsilon=\lambda^{0} f^{0 \prime}+e Y = k = 1 ∑ K β k 0 X k + ε , ε = λ 0 f 0′ + e β ^ R = argmin β ∈ R K L N T R ( β ) \widehat{\boldsymbol{\beta}}_{R}=\underset{\beta \in \mathbb{R}^{K}}{\operatorname{argmin}} \mathcal{L}_{N T}^{R}(\beta) β R = β ∈ R K argmin L NT R ( β ) L N T R ( β ) = min { Λ ∈ R N × R , F ∈ R T × R } 1 N T ∥ Y − β ⋅ X − Λ F ′ ∥ H S 2 = min F ∈ R T × R 1 N T Tr [ ( Y − β ⋅ X ) ′ ( Y − β ⋅ X ) M F ] = 1 N T ∑ r = R + 1 T μ r [ ( Y − β ⋅ X ) ′ ( Y − β ⋅ X ) ] , \begin{aligned} \mathcal{L}_{N T}^{R}(\beta) &=\min _{\left\{\Lambda \in \mathbb{R}^{N \times R}, F \in \mathbb{R}^{T \times R}\right\}} \frac{1}{N T}\left\|Y-\beta \cdot X-\Lambda F^{\prime}\right\|_{\mathrm{HS}}^{2} \\ &=\min _{F \in \mathbb{R}^{T \times R}} \frac{1}{N T} \operatorname{Tr}\left[(Y-\beta \cdot X)^{\prime}(Y-\beta \cdot X) M_{F}\right] \\ &=\frac{1}{N T} \sum_{r=R+1}^{T} \mu_{r}\left[(Y-\beta \cdot X)^{\prime}(Y-\beta \cdot X)\right], \end{aligned} L NT R ( β ) = { Λ ∈ R N × R , F ∈ R T × R } min NT 1 ∥ Y − β ⋅ X − Λ F ′ ∥ HS 2 = F ∈ R T × R min NT 1 Tr [ ( Y − β ⋅ X ) ′ ( Y − β ⋅ X ) M F ] = NT 1 r = R + 1 ∑ T μ r [ ( Y − β ⋅ X ) ′ ( Y − β ⋅ X ) ] , The resulting optimization problem for F is a principal components problem, so that the optimal F is given by the R largest principal components. At the optimum, the projector M F M_F M F therefore exactly projects out the R largest eigenvalues of this matrix, which gives rise to the final formulation of the profile objective function as the sum over its (T − R) smallest eigenvalues.
strong factor and strong iv# ASSUMPTION SF-Strong Factor Assumption:
(i) We have 0 < plim N , T → ∞ 1 N λ 0 ′ λ 0 < ∞ 0<\operatorname{plim}_{N, T \rightarrow \infty} \frac{1}{N} \lambda^{0 \prime} \lambda^{0}<\infty 0 < plim N , T → ∞ N 1 λ 0′ λ 0 < ∞ .
(ii) We have 0 < plim N , T → ∞ 1 T f 0 ′ f 0 < ∞ 0<\operatorname{plim}_{N, T \rightarrow \infty} \frac{1}{T} f^{0 \prime} f^{0}<\infty 0 < plim N , T → ∞ T 1 f 0′ f 0 < ∞ .
9. Suprious Regression# In time-series suprious regression, the suprious regression gives bad result as the estimator is converging to a random variable not a constant.
x i t = x i , t − 1 + ε i t y i t = y i , t − 1 + e i t \begin{aligned} &x_{i t}=x_{i, t-1}+\varepsilon_{i t} \\ &y_{i t}=y_{i, t-1}+e_{i t} \end{aligned} x i t = x i , t − 1 + ε i t y i t = y i , t − 1 + e i t In panel model:
y i t = α + x i t β + u i t u i t = μ i + ν i t \begin{gathered} y_{i t}=\alpha+x_{i t} \beta+u_{i t} \\ u_{i t}=\mu_{i}+\nu_{i t} \end{gathered} y i t = α + x i t β + u i t u i t = μ i + ν i t Kao (1999) shown by sequential asymptotics:
n β ^ F E → d N ( 0 , 2 σ e 2 5 σ ε 2 ) \sqrt{n} \widehat{\beta}_{F E} \stackrel{d}{\rightarrow} N\left(0, \frac{2 \sigma_{e}^{2}}{5 \sigma_{\varepsilon}^{2}}\right) n β FE → d N ( 0 , 5 σ ε 2 2 σ e 2 ) In panel, the estimator in the suprious regression gives good inference as it converges to a zero mean distribution.