Friday, March 30, 2012

Concordance, Correlation, Agreement -- Statistics

Stata Command
Lin’s Concordance Correlation
According to Lin (1989), this index “evaluates the agreement between two readings (from the same sample) by measuring variation from the 45 degree line through the origin (the concordance line).”  Neither Lin nor the Stata Technical Bulletin (STB-43) insert suggest that this index can be used for categorical data, although neither explicitly forbids as much.
Cohen’s Kappa Coefficient
Jacob Cohen’s (1960) measure for inter-rater agreement.  Values range between [0,1] with zero denoting the amount of agreement expected by chance alone and one denoting perfect agreement.  Although the statistic is grounded upon assessing agreement between “raters”, could it be adapted for use with establishing agreement between items on a questionnaire/survey? 
Kendall’s Coefficient of Concordance / Kendall’s W / Friedman’s Test
Calculates Friedman’s non-parametric two-way analysis of variance and Kendall’s coefficient of concordance.  One p-value is provided since the tests are equivalent although the Kendall’s statistic may be easier to interpret since it is bounded by [0,1] and is a measure of the agreement between rankings.  It is unclear whether this test is suitable for ordinal variables. 
Kendall’s Rank Correlation / Kendall’s Tau
Kendall’s Tau-a and Tau-b are calculated where the only difference between the two is in their denominators.  Tau-a uses total number of pairs whereas Tau-b incorporates number of tied values (Tau-b will be larger if ties exist).  These statistics are closely related to Spearman’s Rho and don’t necessarily assess agreement, but independence.  According to Conover (1999, p.323), Spearmans and Kendall’s will produce nearly identical results in most cases although Spearman’s will tend to be larger in an absolute sense.  Since I'm more concerned with assessing agreement rather than independence -- rejection of an independence null is expected -- I question this test's applicability.
McNemar’s Test (2x2); Bowker’s Test (KxK)
For a 2x2 table, the test reduces to a McNemar’s test whereas for a KxK table, the Bowker’s test for table symmetry and the Stuart-Maxwell test for marginal homogeneity are calculated.  The test assumes a 1-to-1 matching of cases and controls and is used to analyze matched-pair case-control data with multiple discrete levels of the outcome/exposure variable.  I’m not 100% sure whether this test is suitable for what I need although if I can frame it such that the instrument items are case and control, respectively, and the symmetry and marginal homogeneity tests are non-significant then it would suggest that a subject’s responses to two items aren’t different.  Need to investigate this possibility.

The research into a suitable method for assessing agreement between two items on a survey/questionnaire hasn't been as straightforward and unambiguous as I'd hoped.  (Although if it were then perhaps the Ph.D. wouldn't be nearly as masochistic?)  Per a search of the literature, the Stata help files, and the Stata listserve I've identified some test statistics that are pretty good candidates for what I need.  I figure placing them in a table along with brief descriptions will aid in identifying which, if any, is most appropriate (this is, of course, assuming that the agreement/equivalence aspect of my research remains in place).  There are also graphical methods of assessing categorical, ordinal agreement -- of which I'll present those in a forthcoming post.

Thursday, March 29, 2012

Non-Clinical Equivalence?

Determining whether two measures are equivalent is a tricky thing in statistics.  With a standard hypothesis test, the null hypothesis (Ho) is usually one of no effect or no association.  The alternative hypothesis (Ha) is the converse:  existence of an effect or the presence of an association.  In a two-sample case involving continuous data, for example, the null hypothesis is generally framed as testing whether the difference between the two samples is zero.  The alternative hypothesis -- if it is two-sided -- is that the difference is not zero.  Rejection of the null indicates that the difference is not zero and is large enough to not be attributable to chance, whereas failure to reject suggests that the parameters being compared may be equal (or aren't different).  What failure to reject doesn't provide, however, is proof-positive that the parameters are equal.  What happens, then, if we want to establish equality, rather than difference, between two measures or parameters?  Well, technically you can't.  Friedman, Furberg, and DeMets put it best in their very readable Fundamentals of Clinical Trials (3rd ed. pp. 118):
The problem in designing positive control studies is that there can be no statistical method to demonstrate complete equivalence.  That is, it is not possible to show [delta]=0.  Failure to reject the null hypothesis is not sufficient reason to claim two interventions to be equal but merely that the evidence is inadequate to say they are different.
They go on to state that even though you can't demonstrate complete equivalence, one approach is to designate a value for delta such that intervention(s) with differences less than the specified value might be indicative of equivalence.  I've never been involved with a clinical trials equivalence study so I doubt that I'm qualified to write much more in that regard but in my dissertation research, I'm facing a similar problem.  At least I think it's a similar problem.  Or maybe it really isn't a problem but I'm creating one.  Either way, I'm stumped.

The crux of my research relies on a measure of medication adherence that really doesn't measure adherence, per se, but reasons for non-adherence.  In most studies of this type, adherence might me ascertained either via a direct (e.g. measurement from blood) or indirect method (e.g. patient questionnaire, pill count) with the direct method being more reliable and the indirect methods being more feasible.  My measure of adherence follows from subject responses to nine reasons for missing medications with higher reported frequencies corresponding to lower adherence.  One way to denote a subject's adherence level is to assign numbers to each of the frequency levels with higher numbers denoting lower adherence then summing the number corresponding to each reason across reasons to arrive at a single value.  This approach, although straightforward, yields a number that, in itself, is relatively meaningless.  What does it mean if a subject has an adherence value of 9?  Or 29?  Not much in an absolute sense.  The only real meaning follows in a comparative sense:  a subject with an adherence value of 9 is considerably more adherent than a subject with a value of 29.  (Technically, the subject with the lower value reported a lower frequency of missing their medications among the possible reasons.)  And to further complicate things, the instrument also inquired about the subject's degree of confidence (scale of one to ten) to take their medication as directed by their health care provider -- perhaps this question could proxy for the subject's actual adherence?  Now we obviously don't know their actual adherence (obtained objectively, that is) but if it can be shown that responses to the confidence question are "equivalent", as it were, to the reasons for non-adherence questions (or summary score) then wouldn't it be possible to dispense with the reasons questions/score and just use the confidence question?  Or rather than presuppose that "equivalence" will be established -- failing to do so could jeopardize the rest of the statistical analysis -- I could analyze adherence using both the reasons questions and confidence question, as well as incorporate an "equivalence" study.  If the two methods yield similar results and "equivalence" is established, then it could be argued that the confidence question can act as an adherence gauge.

But how to establish "equivalence" in a non-clinical setting between an ordinal variable (confidence question) and either a series of ordinal questions (the nine reasons for missing medications) or the summary score derived from the reasons questions?  One approach -- and this is perhaps the most frequently used approach -- is to correlate the two measures via either Pearson's or Spearman's correlation coefficients.  The problem with assessing equivalence by way of a correlation coefficient is that what it really reveals is degree of linear association ("how well are the measures related?") rather than agreement ("how well do the methods/measures agree?").  A few academics (e.g. Lin, Bland, Altman, etc.) have published and implemented methods for assessing agreement/concordance but I have yet to find anything that is perfectly suited for my task.  All of the methods I've looked into seem appropriate in one way yet inappropriate in another, including the Bland-Altman plot, Lin's concordance correlation for agreement, Cohen's kappa coefficient, Kendall's coefficient of concordance, Kendall's tau, McNemar's test, and Bowker's test.  I've mulled over each of these and I'm still unsure which, if any, is best suited for establishing "equivalence" between two nominal variables.  In order to flush out my thinking and, hopefully, arrive at a decision for which is best for my analysis, I'm going to present and briefly discuss each of the above in a future blog post since the length of this post is getting longer than any random reader should be subjected to.

Wednesday, March 7, 2012

Mechanics of Reading a Scientific Paper

I have 250+ articles in my EndNote library and although I'd like to claim that I've read every single one, I'd be lying if I did.  Reading papers can be tiresome and tedious and, in some cases, overwhelming if your research topic isn't yet narrowly defined or it's objectives keep shifting.  Frustrations aside about defining the boundaries of the lit review, though, how does one read the papers once they're retrieved?  Does one skim the papers or carefully read them?  If the latter, is there a prescribed and efficient way to do so?  When I was a TA for a Research Design course several years ago I vaguely remember a lecture on the 'how' of reading scientific papers and although I don't remember the specifics from that lecture, I do remember thinking, "mentally archive this information -- you may eventually need it."  Fortunately, I had the foresight to recognize that knowing how to systematically read a paper would come in handy but, unfortunately, I couldn't recall the specifics from that lecture.  Since that lecture many years ago, evidence-based medicine has moved mainstream and with it, the need to critically appraise literature and scientific evidence in order to shape clinical practice.  I'm, obviously, not a clinician and won't be practicing evidenced-based medicine (EBM) anytime soon but the ability to critically read scientific papers is a useful skill in any scientific discipline.  And that is where Trisha Greenhalgh's concise and readable "How to Read a Paper:  The Basics of Evidence-Based Medicine" is a must-have for virtually all researchers.  The book isn't cheap given its length -- 238 pages -- although I think the clarity and density of the information makes it well worth it.  The range of topics discussed range from how to search the literature to how to assess economic analyses.  Depending on your discipline and research perspective, some chapters are going to be more relevant than others.  For me, the chapters on how to determine what the paper is about, how to assess it's methodological quality, and the summary of statistical techniques commonly used in scientific papers were the most relevant and informative.

When trying to determine what a paper is about, Greenhalgh emphasizes that a paper should be 'trashed' because of its methods, not its results.  Given the emphasis on the methods, then, three preliminary questions should initiate the appraisal:  
  1. What was the research question -- and why was the study needed?  This should be clearly stated somewhere in the first few paragraphs of the paper.
  2. What was the research design?  The type of design has implications for the statistical analyses used (if any), conclusions, and rigor of the paper.  
  3. Was the research design appropriate to the question?  Not all research questions require a randomized controlled trial (RCT).  
In this chapter, Greenhalgh also briefly discusses each of the research designs common to scientific papers then assigns them a place in the "hierarchy of evidence" with those at the top commanding the most weight and influence re: clinical interventions.  Aside from placing systematic reviews/meta-analyses at the top (particularly helpful in EBM), I think most other disciplines would report a similar hierarchy:
  1. Systematic reviews and meta-analyses.
  2. RCTs with definitive (i.e. statistically significant) results.
  3. RCTs with non-definitive (i.e. suggestive but not statistically significant) results.
  4. Cohort studies.
  5. Case-control studies.
  6. Cross-sectional surveys.
  7. Case reports. 
In the methodological quality chapter, assessment relies on five key questions:  
  1. Was the study original?  Does it duplicate previous research or add something new to the literature?
  2. Whom is the study about?  How were subjects recruited and what were the inclusion/exclusion criteria?  
  3. Was the design of the study sensible?  What and how were the outcomes measured?
  4. Was systematic bias avoided or minimized?  Study adequately controlled?  Was assessment 'blind'?   
  5. Were preliminary statistical questions addressed -- how many subjects enrolled, duration of follow-up, and completeness of follow-up?  
The chapter on statistics is intended for non-statisticians but I still found it helpful and amusing (especially the 'advice' on how to cheat on statistical tests when writing up results).  I've written about this in a previous post and won't repeat it here since what Ben Goldacre wrote relied largely on what Greenhalgh wrote.  Suffice to say, Greenhalgh breaks down the most common statistical analyses used and their interpretation such that even the most timid statistically-averse researcher can make sense of the results. 

Even after I've long since finished slogging through all (most?) of the articles collected for my lit review, I suspect this book will still sit prominently on my bookshelf.