Tuesday, September 4, 2012

Confounders: Control in Design vs. Adjust in Analysis

In my lecture and mid-term review notes from the Spring 2009 advanced epidemiology course I took, I scribbled "control in design; adjust in analysis" with respect to how confounding variables are dealt with depending on the phase of the study in which they are addressed.  In short, if the hypothesized confounder is dealt with in the design phase, it is "controlled" for whereas if it is during the statistical analysis phase, it is "adjusted" for.  This distinction might be more of a semantic issue but it is, nevertheless, one that exists and it affects how confounding variables are handled in a study.  In my introductory epidemiology textbook, "Essentials of Epidemiology in Public Health" by Aschengrau & Seage, they define confounding as the "mixing of effects between an exposure, an outcome, and a third extraneous variable known as a confounder" (p. 282) and that it can be addressed in either the design phase of the study, during the statistical analysis phase, or a combination of the two.  Confounding is a non-trivial issue in epidemiological research and, as such, a great deal of energy has been spent explaining what it is, how it affects associations between exposure and disease, how it is identified, and how it ought to be addressed when present.  Any attempt to discuss all of the aforementioned in this blog post would be both foolhardy and arrogant --- there are numerous texts available that discuss confounding at length --- so I'll touch only on the distinction highlighted above.
Reproduced from "Epidemiology:  Beyond the Basics" (Szklo & Nieto), p. 155

Control or adjustment of a confounder requires that the analyst know (or at least suspect) which variables might present as confounders.  Some disease-exposure associations have well-known confounders (e.g. age or gender) and in this situation, steps can be taken in the design phase of the research to mitigate the effect.  The three primary means of controlling for confounding variables in the design stage are (1) randomization, (2) restriction, or (3) matching.  


Randomization is the defining characteristic of the 'gold standard' study --- a randomized controlled trial (RCT) --- and ensures that, on average, known and unknown confounders will be balanced between the two treatment groups.  Randomization, however, is costly and often unrealistic or unethical in the context of epidemiological studies so this is rarely the go-to strategy for dealing with confounding.  

Restriction narrows the variability of the subjects in the study by, for example, only enrolling females (eliminates the confounding effects of gender) or enrolling only subjects aged 21-25 years of age (eliminates age as a confounder).  The major drawback to this approach is the loss of generalizability of the study:  if only females were analyzed then any inferences drawn from the study apply only to females.  

Matching entails the selection of subjects into the study such that potential confounders are distributed evenly between the two groups (i.e. exposed and unexposed groups).  Matching can occur either on an individual level or at a categorical level (frequency matching) but regardless of the type, the goal is to have the confounder equally distributed within both the exposed and unexposed groups.  The major drawback to matching is that the matching variable cannot be analyzed statistically, i.e. the association between the confounder (matching variable) and the outcome can no longer be estimated.
 

The three primary means of handling confounding in the analysis phase are (1) standardization, (2) stratification, and (3) multivariate methods.  All of these approaches are employed after the study design phase and data collection.  The first strategy, standardization, is usually one of the first principles taught in an elementary epidemiology course and is of two flavors:  direct and indirect.  Standardization is often used to control for standard demographics (e.g. race, gender, or age) and recasts, for example, the mortality rates between two groups as if they had the same race/gender/age distribution.  

Stratification is a bit more complicated but it is, essentially, the separation of results into discrete groups.  Once the groups are determined and created (according to the suspected confounder), the measure of association (e.g. odds ratio) is estimated for each group (strata) then compared to the overall measure of effect.  If the strata-specific odds ratios are close to each other but there is an appreciable difference between them and the overall odds ratio, then confounding is likely to exist.  Stratum-specific odds ratios should be reported in this instance whereas if there is no appreciable difference between the stratum-specific odds ratios and the overall odds ratio then a single, summary measure of effect can be reported (the statistically weighted Mantel-Haenszel estimate).  There are, of course, exceptions to this simplistic guideline and the situation when working with real data can be considerably more complex and nuanced -- confounding can be positive, negative, or qualitative -- but the general thrust of the strategy remains:  stratum-specific estimates being compared to an overall estimate.  The main disadvantage of stratified analysis is that stratification isn't really viable when multiple variables (confounders) exist, thus creating a situation where not every strata will be populated with study subjects.  Estimation of odds ratios/relative risks/etc. in this situation leads to unstable and unreliable results.  The alternative in this situation -- and the one that seems to be the prominent method given the power of modern computing -- is to conduct a multivariate analysis.  

In a multivariate analysis (the brunt of my dissertation analysis relies on multivariate analysis), a statistical model is created (e.g. ordinary least squares regression, Poisson regression, logistic regression) wherein the association between the disease & exposure and disease & confounders, simultaneously, is estimated.  Inserting all the variables into the model in this way allows for the control of various confounding variables simultaneously.  As powerful as multivariate analysis (i.e. regression modeling) is, it shouldn't be willy-nilly.  The power and ease with which statistical models can be built is astonishing and unless the objective is to dredge the data and explore for associations (exploratory data analysis), the model building should be purposeful and thoughtful and follow from the study hypothesis/es. 

No comments:

Post a Comment