Well sure, Angrist and Plishke and their legion of devotees, but that barn has already been burned by my main man, Edward Leamer (see the section entitled "White-washing).
But now Cameron & Miller take up the cudgel to argue for OLS plus "cluster-robust" standard errors.
1. Under Heteroskedasticity, OLS is unbiased but inefficient (has a variance larger than the minimum variance unbiased estimator) and normal OLS standard errors are biased.
2. The robust standard errors crowd ignores the first issue to focus on the second, but do not produce such a great answer even to the problem of the OLS standard errors, because their robust standard errors have only an asymptotic justification (i.e. the only property that can be shown for them is consistency).
3. Leamer has argued that the first problem is the more important problem. Since some forms of heteroskedasticity make OLS extremely inefficient, we need to find a better estimator. These are often called FGLS (feasibly generalized least squares) estimators, where the researcher estimates a model of the conditional variance along with the model for the conditional mean.
4. Here is where the robust standard errors folks raise the bogeyman of bias in their argument against looking for a better estimator.
Here's the money quote from Cameron & Miller:
"One way to control for clustered errors in a linear regression model is to additionally specify a model for the within-cluster error correlation, consistently estimate the parameters of this error correlation model, and then estimate the original model by feasible generalized least squares (FGLS) rather than ordinary least squares (OLS). ... If all goes well this provides valid statistical inference, as well as estimates of the parameters of the original regression model that are more efficient than OLS. However, these desirable properties hold only under the very strong assumption that the model for within-cluster error correlation is correctly specified."
Look people, we have two enemies when we try to get a point estimate of an unknown parameter, variance and bias.
Suppose you don't have the exactly correct functional form of the conditional variance but you do FGLS anyway. Say you create 10% bias by doing so. It is still the case that the reduction in the variance may well be sufficient to accept that increased bias and use the mis-specified FGLS estimator instead of the least squares estimator.
In my own research work on GARCH models with Rodolfo Cermeno, we show that even incorrectly specifying the conditional variance often produces coefficient estimates superior on mean squared error grounds to OLS.
This happens because of the extreme inefficiency of OLS in the face of some forms of heteroskedasticity and because small mis-specifications of the conditional variance model do not seem to lead to large biases in the estimates of the conditional mean parameters.
Furthermore, you actually can perform some statistical tests to see how well your chosen model of the conditional variance is working. Simply use the standardized FGLS errors and test them for general forms of heteroskedasticity and see what happens. If we fail to reject the null of no heteroskedasticity at say the .25 level, we are pretty confident in our functional form.
Finally, since all we know about robust standard errors is that they are consistent, in a finite sample the OLS + robust errors approach can give us a very inefficient parameter estimate and a biased standard error for that parameter estimate.
This problem is even greater in the case of clustered residuals because the requirement becomes that the number of clusters goes to infinity, not just the number of observations!
Double finally, let me note (and Camerer & Miller do a good job of explaining this), FGLS and clustered standard errors are not mutually exclusive. You can do both. My recent paper with Dan Hicks and Weici Yuan is an example.
Mrs. Angus has just informed me that this piece should be titled, "The Nerdiest Rant Ever".
But people, variance is just as big a problem as bias and consistency alone is a weak reed on which to base your estimation strategy.
Mrs. Angus has just informed me that I've just proven her point with the previous sentence.