## Thursday, September 20, 2012

### Robust Variance Estimate

When the variance-covariance structure of a model is mis-specified, it needs adjusting.  Mis-specification may follow from a clustering of observations, over- or under-dispersion of the variance in relation to the model-specified variation, the data don't come from a simple random sample, or the distribution of $(x_i, \epsilon_i)$ is not independent and identically distributed (i.i.d.) (Lachin 2000, Stata 11 Reference Manual).  In this situation, a robust estimate of the variance must be obtained.  I'm not sure if this variance estimate has an official undisputed name but its longest name is the Huber/White/Sandwich estimate although in the interest of brevity, I'll use robust variance estimate.

Under normal circumstances, $E(y_i) = Var(y_i)$, the maximum likelihood estimate (MLE) of $\beta$ is
$\sqrt{n}(\hat \beta_{ML} - \beta_0) \:\: \underrightarrow{d} \:\: N(0, I(B_0)^{-1})$,
where  $I(B_0)$ is the Fisher Information Matrix equal to minus the expected value of the Hessian matrix:
$I(B_0) = -E\Bigl[\frac{\partial^2 \ell(\beta; y_i, x_i)}{\partial \beta \partial \beta'}\Bigr]_\beta$
Per the previous post, the Hessian is the second derivative of the log likelihood with respect to $\beta$:
$\frac{\partial^2 \ell(\beta; y_i, x_i)}{\partial \beta \partial \beta'} = -\sum_{k=1}^N exp(x'_i \beta)x_i x'_i$
which is then negated and inverted per the properties of the MLE and Fisher Information
$I(B_0) = -E[H(B_0)] \: \rightarrow \: Var(\hat \beta) = [n I(\beta_0)]^{-1} = n^{-1} E[exp(x' \beta_0)xx']^{-1}$

When the variance is not equal to the mean --- under- or over-dispersion is present --- then a pseudo-maximum likelihood (PML) estimate must be estimated.  The only difference between the PML variance matrix and normal ML variance matrix is that the empirical estimate of the observed information is sandwiched between the model-based variance of the estimates (Lachin 2000, Winklemann 2008).  Symbolically we can denote the asymptotic ML variance matrix as
$n^{-1} I^{-1} = n^{-1} E[exp(x' \beta_0)xx']^{-1}$
and the asymptotic PML variance matrix as
$n^{-1} I^{-1} J I^{-1} = n^{-1} E[exp(x' \beta_0) xx']^{-1} \: E[(y - exp(x'\beta_0))^2 xx'] \: E[exp(x'\beta_0)xx']^{-1}$

The distribution of the Poisson PMLE (assuming a large but finite sample), then, is:
$\hat \beta \; \sim \; N \: (\beta_0, \widehat{Var}(\hat \beta))$
where $\widehat{Var}(\hat \beta)) = \Bigl[\sum_{i=1}^n x_i x'_i exp(x'_i \hat \beta)\Bigr]^{-1} \; \sum_{i=1}^n x_i x'_i \widehat{Var}(y_i \vert x_i) \; \Bigl[\sum_{i=1}^n x_i x'_i exp(x'_i \beta)\Bigr]^{-1}$
The estimation of $\widehat{Var}(y_i \vert x_i)$ follows from the outer product of the score formula, denoted thus:
$\widehat{Var}(y_i \vert x_i) = (y_i - exp(x'_i \hat \beta))^2$ .

The estimation of the Poisson variance matrix in this way then leads to consistent estimation of not only the parameter estimates but also the standard errors.  In the presence of, for example, over-dispersion, the variance (standard errors) would be underestimated resulting in an artificial inflation of the t-statistics, leading to spurious rejection of the null hypotheses.  Adjusting the variance, as in a robust Poisson regression, corrects this problem such that valid inferences are obtained (Winklemann 2008).