Thursday, March 29, 2012

Non-Clinical Equivalence?

Determining whether two measures are equivalent is a tricky thing in statistics.  With a standard hypothesis test, the null hypothesis (Ho) is usually one of no effect or no association.  The alternative hypothesis (Ha) is the converse:  existence of an effect or the presence of an association.  In a two-sample case involving continuous data, for example, the null hypothesis is generally framed as testing whether the difference between the two samples is zero.  The alternative hypothesis -- if it is two-sided -- is that the difference is not zero.  Rejection of the null indicates that the difference is not zero and is large enough to not be attributable to chance, whereas failure to reject suggests that the parameters being compared may be equal (or aren't different).  What failure to reject doesn't provide, however, is proof-positive that the parameters are equal.  What happens, then, if we want to establish equality, rather than difference, between two measures or parameters?  Well, technically you can't.  Friedman, Furberg, and DeMets put it best in their very readable Fundamentals of Clinical Trials (3rd ed. pp. 118):
The problem in designing positive control studies is that there can be no statistical method to demonstrate complete equivalence.  That is, it is not possible to show [delta]=0.  Failure to reject the null hypothesis is not sufficient reason to claim two interventions to be equal but merely that the evidence is inadequate to say they are different.
They go on to state that even though you can't demonstrate complete equivalence, one approach is to designate a value for delta such that intervention(s) with differences less than the specified value might be indicative of equivalence.  I've never been involved with a clinical trials equivalence study so I doubt that I'm qualified to write much more in that regard but in my dissertation research, I'm facing a similar problem.  At least I think it's a similar problem.  Or maybe it really isn't a problem but I'm creating one.  Either way, I'm stumped.

The crux of my research relies on a measure of medication adherence that really doesn't measure adherence, per se, but reasons for non-adherence.  In most studies of this type, adherence might me ascertained either via a direct (e.g. measurement from blood) or indirect method (e.g. patient questionnaire, pill count) with the direct method being more reliable and the indirect methods being more feasible.  My measure of adherence follows from subject responses to nine reasons for missing medications with higher reported frequencies corresponding to lower adherence.  One way to denote a subject's adherence level is to assign numbers to each of the frequency levels with higher numbers denoting lower adherence then summing the number corresponding to each reason across reasons to arrive at a single value.  This approach, although straightforward, yields a number that, in itself, is relatively meaningless.  What does it mean if a subject has an adherence value of 9?  Or 29?  Not much in an absolute sense.  The only real meaning follows in a comparative sense:  a subject with an adherence value of 9 is considerably more adherent than a subject with a value of 29.  (Technically, the subject with the lower value reported a lower frequency of missing their medications among the possible reasons.)  And to further complicate things, the instrument also inquired about the subject's degree of confidence (scale of one to ten) to take their medication as directed by their health care provider -- perhaps this question could proxy for the subject's actual adherence?  Now we obviously don't know their actual adherence (obtained objectively, that is) but if it can be shown that responses to the confidence question are "equivalent", as it were, to the reasons for non-adherence questions (or summary score) then wouldn't it be possible to dispense with the reasons questions/score and just use the confidence question?  Or rather than presuppose that "equivalence" will be established -- failing to do so could jeopardize the rest of the statistical analysis -- I could analyze adherence using both the reasons questions and confidence question, as well as incorporate an "equivalence" study.  If the two methods yield similar results and "equivalence" is established, then it could be argued that the confidence question can act as an adherence gauge.

But how to establish "equivalence" in a non-clinical setting between an ordinal variable (confidence question) and either a series of ordinal questions (the nine reasons for missing medications) or the summary score derived from the reasons questions?  One approach -- and this is perhaps the most frequently used approach -- is to correlate the two measures via either Pearson's or Spearman's correlation coefficients.  The problem with assessing equivalence by way of a correlation coefficient is that what it really reveals is degree of linear association ("how well are the measures related?") rather than agreement ("how well do the methods/measures agree?").  A few academics (e.g. Lin, Bland, Altman, etc.) have published and implemented methods for assessing agreement/concordance but I have yet to find anything that is perfectly suited for my task.  All of the methods I've looked into seem appropriate in one way yet inappropriate in another, including the Bland-Altman plot, Lin's concordance correlation for agreement, Cohen's kappa coefficient, Kendall's coefficient of concordance, Kendall's tau, McNemar's test, and Bowker's test.  I've mulled over each of these and I'm still unsure which, if any, is best suited for establishing "equivalence" between two nominal variables.  In order to flush out my thinking and, hopefully, arrive at a decision for which is best for my analysis, I'm going to present and briefly discuss each of the above in a future blog post since the length of this post is getting longer than any random reader should be subjected to.

No comments:

Post a Comment