## Tuesday, January 31, 2012

### PFS and OS Curves Overlaid

The plotting of Kaplan-Meier (K-M) survival curves is about as straightforward and basic as you can get when doing survival analysis.  The curves, however, are simply a plotting of the values obtained when calculating the Kaplan-Meier estimate (product-limit method) of the survivor function.  Briefly explained, the estimates are obtained by the successive multiplication of the proportion of subjects that haven't yet "failed" divided by the number of subjects alive at the beginning of a respective interval.  Each interval is designed such that one "fail" (death) time is contained in the interval.  Subjects whose outcomes are censored (e.g. lost to follow-up, study terminated before reaching outcome, etc.) are included in the interval in which they occur although they are not used to calculate the probability of survival/failure at any subsequent time point.  Symbolically, the K-M survivor function can be expressed thus,

where n is the number of subjects alive in the kth interval and d is the number that fail (death).  As instructive as it is to manually calculate the survivor function and plot it, the ease with which you can generate and output the curves using Stata, SAS, and R relegates any pencil-and-paper calculations to being a learning exercise (I'll be using Stata in this post).

In a current collaboration with a friend of mine we needed K-M curves for overall survival (OS) and progression-free survival (PFS) and we needed them plotted on the same graph.  (A previous post explained the differences between the primary definitions of survival used in the medical literature, OS & PFS obviously included.)  This is a reasonable and, superficially, simple request but upon further thought about how to implement this in Stata it became less obvious.  In Stata you must declare your data to be survival time via -stset- before invoking any survival analysis commands and in doing so you specify when subjects enter the analysis time, when they exit, and what constitutes censorship.  These parameters -- particularly the censorship variable -- are going to differ depending on whether you are examining overall survival, progression-free survival, or event-free survival.  So in order to get around this, I -stset- the data twice -- first for OS then again for PFS -- and saved the survivor function from each.  The syntax is simple:  -stsset- followed by a -sts gen- for OS then again for PFS after you've cleared the first -stset-.  The code follows:

* **OS
stset dox_os, failure(status==1) origin(dxdt)
sts gen s_os = s        // survivor function
gen double t_os = _t    // analysis time

* **PFS
stset, clear
stset dox_pfs, failure(death==1) origin(dxdt)
sts gen s_pfs = s       // survivor function
gen double t_pfs = _t   // analysis time

In order to overlay the two curves on the same graph I initially thought I would need to use the -twoway line yvar1 xvar || line yvar2 xvar, sort - syntax but this returned a graph that properly plotted one curve but not the other.  I think I may have eventually had success with this approach but not without manipulating the data structure and creating two "fake" groups -- one for OS and one for PFS -- then plotting using a -by- option.  I wasn't convinced this was the most efficient and elegant way so I searched the Statalist archives and, as expected, someone else had posed this same problem a few years ago.  Turns out the solution requires nothing more than the addition of -addplot- to the -sts graph- command where the survivor function you are adding is the one saved from a previous -stset-.  The code, as well as the resultant graph, follow:

sts graph, addplot(line s_os t_os, sort c(J) ///
legend(order(1 "PFS" 2 "OS"))) scheme(sj) ytitle("Survival") ///
xtitle("Days Post Diagnosis") ylabel(0(.25)1) ytick(0(.25)1) ///
xlabel(0(5)25) xtick(0(1)25) title("Survival:  PFS vs OS")