I recently ran an ultra marathon, the Trail Peñalara 60k, in Navacerrada, Spain (about 30 miles outside of Madrid) --- you can read my race report here --- and although neither the running nor the writing up of the experience have anything to do with pursuit of my Ph.D., the graphical presentation of the checkpoint times seemed rigorous and interesting enough to label as a "PhD" blog post. (And since I was working on this instead of the power calculation for my dissertation I figured I may as well legitimize the time spent and add a blog post about it.)

This race, unlike any other I've run in the United States, required that each runner wear an electronic chip bracelet that had to be scanned at the start, the finish, and seven control points in between. I can only speculate as to why our race chips were scanned as many times as they were --- to deter cheating? for data collection? --- but no matter, the race organizers posted the data on their website and I thought it would be interesting to identify the fastest, slowest, middle, and average times for each aid station/checkpoint (including the finish) then compare those times to my times at the respective points. Based on the finishing times --- I was 39th among 147 finishers --- I knew I was faster than the middle (median) and average (mean) runners but I was curious as to how my times stacked up at the intermediate check points. The short explanation of how I did this is I imported the data into Stata, identified the fastest (minimum), slowest (maximum), middle (median), and average (mean) then created a new dataset containing these summary statistics. I then merged my times into that summary statistic dataset such that the resultant dataset contained nine observations with six variables. The first variable in this dataset is the aid station/checkpoint and the remaining five variables contain the slowest/fastest/average/middle/my times recorded at each checkpoint. The summary dataset is presented below:

+-----------------------------------------------------------------------------+

| as min_as max_as mean_as mdn_as cjt_as |

|-----------------------------------------------------------------------------|

| Start (Rascafria) 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 |

| El Reventon Pass 01:02:12 02:23:08 01:44:04 01:43:26 01:31:44 |

| Penalara 01:58:58 04:44:30 03:18:15 03:14:29 02:54:09 |

| La Granja 02:49:44 07:02:42 05:00:20 04:56:00 04:25:27 |

| Casa de la Pesca 04:17:27 10:16:42 07:26:50 07:18:47 06:45:11 |

| Fuenfria 05:00:54 11:41:58 08:34:15 08:26:47 07:41:40 |

| Navacerrada Pass 05:40:14 12:53:42 09:38:06 09:27:47 08:41:14 |

| La Barranca 06:23:54 14:37:36 10:48:46 10:35:31 09:39:31 |

| Finish (Navacerrada) 06:45:43 15:34:39 11:29:04 11:15:00 10:15:21 |

+-----------------------------------------------------------------------------+

The code I used from start to finish is presented below.

capture log close

log using tp60k_graph, replace

datetime

// program: tp60k_graph.do

// task: graph times from TP60k

// project: drivel

// author: cjt

// born on date: 20120705

// #0

// program setup

version 11.2

clear all

macro drop _all

set more off

// #1

// insheet .csv

insheet using "C:\Documents and Settings\cjt\Desktop\TP60k\TP60k_times.csv", comma

// #2

// convert string time variables to numeric variables

foreach var of varlist as1_reventon as2_penalara as3_lagranja as4_cpesca as5_fuenfria ///

as6_navac as7_barranca finishtm {

gen double `var'_temp = clock(`var', "hms")

drop `var'

rename `var'_temp `var'

format `var' %tcHH:MM:SS

}

*end;

* **add start time variable

gen as0_start = 0

format as0_start %tcHH:MM:SS

// #3

// -save- entire dataset

save tp60k, replace

// #4

// extract my times from the dataset

keep if place==39

keep as* finishtm

* **prefix time variables w/ my initials, cjt

rename as0_start cjt_as0

rename as1_reventon cjt_as1

rename as2_penalara cjt_as2

rename as3_lagranja cjt_as3

rename as4_cpesca cjt_as4

rename as5_fuenfria cjt_as5

rename as6_navac cjt_as6

rename as7_barranca cjt_as7

rename finishtm cjt_as8

* **save dataset for later merge

save tp60k_cjt, replace

// #5

// recall earlier, fuller dataset

use tp60k, clear

// #6

// -collapse- data for graph

collapse (min) min_as0=as0_start min_as1=as1_reventon min_as2=as2_penalara min_as3=as3_lagranja ///

min_as4=as4_cpesca min_as5=as5_fuenfria min_as6=as6_navac min_as7=as7_barranca ///

min_as8=finishtm ///

(max) max_as0=as0_start max_as1=as1_reventon max_as2=as2_penalara max_as3=as3_lagranja ///

max_as4=as4_cpesca max_as5=as5_fuenfria max_as6=as6_navac max_as7=as7_barranca ///

max_as8=finishtm ///

(mean) mean_as0=as0_start mean_as1=as1_reventon mean_as2=as2_penalara mean_as3=as3_lagranja ///

mean_as4=as4_cpesca mean_as5=as5_fuenfria mean_as6=as6_navac mean_as7=as7_barranca ///

mean_as8=finishtm ///

(median) mdn_as0=as0_start mdn_as1=as1_reventon mdn_as2=as2_penalara mdn_as3=as3_lagranja ///

mdn_as4=as4_cpesca mdn_as5=as5_fuenfria mdn_as6=as6_navac mdn_as7=as7_barranca ///

mdn_as8=finishtm

// #7

// -merge- in the cjt data

merge 1:1 _n using tp60k_cjt

drop _merge

// #8

// -reshape- from wide to long

gen index = .

reshape long min_as max_as mean_as mdn_as cjt_as, i(index) j(as)

drop index

// #9

// -label- values of "as" varialble

label define aid 0 "Start (Rascafria)" 1 "El Reventon Pass" 2 "Penalara" 3 "La Granja" ///

4 "Casa de la Pesca" 5 "Fuenfria" 6 "Navacerrada Pass" 7 "La Barranca" 8 "Finish (Navacerrada)"

label values as aid

// #10

// generate -graph-

line min_as as || line max_as as || line mean_as as || line mdn_as as || line cjt_as as, ///

ytick(0(7200000)57600000) ylabel(0 "0h" 7200000 "2h" 14400000 "4h" 21600000 "6h" 28800000 "8h" ///

36000000 "10h" 43200000 "12h" 50400000 "14h" 57600000 "16h", angle(horizontal)) ///

ytitle("Cumulative Time") xtitle("Aid Stations/Check Points") xtick(0(1)8) xlabel(0(1)8, ///

valuelabel angle(45)) legend(col(1) pos(3) lab(1 "Minimum") lab(2 "Maximum") lab(3 "Mean") lab(4 "Median") ///

lab(5 "My Time")) legend(subtitle("Times")) title("Trail Penalara 60k") subtitle("Control Post Times") ///

note("Data downloaded from http://www.grantrail.es/index.asp") scheme(s1color)

// #11

// -export- graph

gr export tp60k_graph.png, replace

log close

exit

The resulting graph:

This race, unlike any other I've run in the United States, required that each runner wear an electronic chip bracelet that had to be scanned at the start, the finish, and seven control points in between. I can only speculate as to why our race chips were scanned as many times as they were --- to deter cheating? for data collection? --- but no matter, the race organizers posted the data on their website and I thought it would be interesting to identify the fastest, slowest, middle, and average times for each aid station/checkpoint (including the finish) then compare those times to my times at the respective points. Based on the finishing times --- I was 39th among 147 finishers --- I knew I was faster than the middle (median) and average (mean) runners but I was curious as to how my times stacked up at the intermediate check points. The short explanation of how I did this is I imported the data into Stata, identified the fastest (minimum), slowest (maximum), middle (median), and average (mean) then created a new dataset containing these summary statistics. I then merged my times into that summary statistic dataset such that the resultant dataset contained nine observations with six variables. The first variable in this dataset is the aid station/checkpoint and the remaining five variables contain the slowest/fastest/average/middle/my times recorded at each checkpoint. The summary dataset is presented below:

+-----------------------------------------------------------------------------+

| as min_as max_as mean_as mdn_as cjt_as |

|-----------------------------------------------------------------------------|

| Start (Rascafria) 00:00:00 00:00:00 00:00:00 00:00:00 00:00:00 |

| El Reventon Pass 01:02:12 02:23:08 01:44:04 01:43:26 01:31:44 |

| Penalara 01:58:58 04:44:30 03:18:15 03:14:29 02:54:09 |

| La Granja 02:49:44 07:02:42 05:00:20 04:56:00 04:25:27 |

| Casa de la Pesca 04:17:27 10:16:42 07:26:50 07:18:47 06:45:11 |

| Fuenfria 05:00:54 11:41:58 08:34:15 08:26:47 07:41:40 |

| Navacerrada Pass 05:40:14 12:53:42 09:38:06 09:27:47 08:41:14 |

| La Barranca 06:23:54 14:37:36 10:48:46 10:35:31 09:39:31 |

| Finish (Navacerrada) 06:45:43 15:34:39 11:29:04 11:15:00 10:15:21 |

+-----------------------------------------------------------------------------+

The code I used from start to finish is presented below.

capture log close

log using tp60k_graph, replace

datetime

// program: tp60k_graph.do

// task: graph times from TP60k

// project: drivel

// author: cjt

// born on date: 20120705

// #0

// program setup

version 11.2

clear all

macro drop _all

set more off

// #1

// insheet .csv

insheet using "C:\Documents and Settings\cjt\Desktop\TP60k\TP60k_times.csv", comma

// #2

// convert string time variables to numeric variables

foreach var of varlist as1_reventon as2_penalara as3_lagranja as4_cpesca as5_fuenfria ///

as6_navac as7_barranca finishtm {

gen double `var'_temp = clock(`var', "hms")

drop `var'

rename `var'_temp `var'

format `var' %tcHH:MM:SS

}

*end;

* **add start time variable

gen as0_start = 0

format as0_start %tcHH:MM:SS

// #3

// -save- entire dataset

save tp60k, replace

// #4

// extract my times from the dataset

keep if place==39

keep as* finishtm

* **prefix time variables w/ my initials, cjt

rename as0_start cjt_as0

rename as1_reventon cjt_as1

rename as2_penalara cjt_as2

rename as3_lagranja cjt_as3

rename as4_cpesca cjt_as4

rename as5_fuenfria cjt_as5

rename as6_navac cjt_as6

rename as7_barranca cjt_as7

rename finishtm cjt_as8

* **save dataset for later merge

save tp60k_cjt, replace

// #5

// recall earlier, fuller dataset

use tp60k, clear

// #6

// -collapse- data for graph

collapse (min) min_as0=as0_start min_as1=as1_reventon min_as2=as2_penalara min_as3=as3_lagranja ///

min_as4=as4_cpesca min_as5=as5_fuenfria min_as6=as6_navac min_as7=as7_barranca ///

min_as8=finishtm ///

(max) max_as0=as0_start max_as1=as1_reventon max_as2=as2_penalara max_as3=as3_lagranja ///

max_as4=as4_cpesca max_as5=as5_fuenfria max_as6=as6_navac max_as7=as7_barranca ///

max_as8=finishtm ///

(mean) mean_as0=as0_start mean_as1=as1_reventon mean_as2=as2_penalara mean_as3=as3_lagranja ///

mean_as4=as4_cpesca mean_as5=as5_fuenfria mean_as6=as6_navac mean_as7=as7_barranca ///

mean_as8=finishtm ///

(median) mdn_as0=as0_start mdn_as1=as1_reventon mdn_as2=as2_penalara mdn_as3=as3_lagranja ///

mdn_as4=as4_cpesca mdn_as5=as5_fuenfria mdn_as6=as6_navac mdn_as7=as7_barranca ///

mdn_as8=finishtm

// #7

// -merge- in the cjt data

merge 1:1 _n using tp60k_cjt

drop _merge

// #8

// -reshape- from wide to long

gen index = .

reshape long min_as max_as mean_as mdn_as cjt_as, i(index) j(as)

drop index

// #9

// -label- values of "as" varialble

label define aid 0 "Start (Rascafria)" 1 "El Reventon Pass" 2 "Penalara" 3 "La Granja" ///

4 "Casa de la Pesca" 5 "Fuenfria" 6 "Navacerrada Pass" 7 "La Barranca" 8 "Finish (Navacerrada)"

label values as aid

// #10

// generate -graph-

line min_as as || line max_as as || line mean_as as || line mdn_as as || line cjt_as as, ///

ytick(0(7200000)57600000) ylabel(0 "0h" 7200000 "2h" 14400000 "4h" 21600000 "6h" 28800000 "8h" ///

36000000 "10h" 43200000 "12h" 50400000 "14h" 57600000 "16h", angle(horizontal)) ///

ytitle("Cumulative Time") xtitle("Aid Stations/Check Points") xtick(0(1)8) xlabel(0(1)8, ///

valuelabel angle(45)) legend(col(1) pos(3) lab(1 "Minimum") lab(2 "Maximum") lab(3 "Mean") lab(4 "Median") ///

lab(5 "My Time")) legend(subtitle("Times")) title("Trail Penalara 60k") subtitle("Control Post Times") ///

note("Data downloaded from http://www.grantrail.es/index.asp") scheme(s1color)

// #11

// -export- graph

gr export tp60k_graph.png, replace

log close

exit

The resulting graph:

As is evident from the graph, the middle and average run times were nearly the same over the course of the race, thus suggesting a non-skewed distribution of times. My times were only slightly better than the middle and median times up until Casa de la Pesca but then I managed to increase the gap, per the widening of the gap between my time and the mean/median times after that checkpoint.

## No comments:

## Post a Comment