Monday, November 14, 2011

-forvalues- nested in -forvalues-

While not working on the statistical methods section of my proposal -- I'm banging my head against my desk trying to figure out how to assess predictive value in the absence of a gold standard -- I was playing around with Stata -forvalues- loops in a program I wrote that isn't even remotely related to my dissertation work.  Nevertheless, I was pleased when I managed to get a nested -forvalues- loop working properly (even if the time spent working on it was a couple hours more than expected).  

The problem:  I created 27 variables where each variable contained a random ordering of the number sequence [1, 6] and I wanted to compare the first variable with the 26 following, the second variable with 25 following, the third variable with the 24 following, and so on, to verify that no two variables contained equivalent number orderings.  I wanted, essentially, 26! (factorial) variables to be created with no two variable pairs being repeated.  For example, once the comparison of variable one and variable two was accomplished with the creation of the new variable, var1_2, it wasn't necessary to compare them again in the other direction, i.e. creation of var2_1.  So what I needed was an outer -forvalues- loop that looped from 1 to 26 and an inner -forvalues- loop that looped from  2 to 27.  After much conceptualizing, tweaking, and experimenting, I eventually found success with this:

forvalues i = 1(1)26 {
 forvalues j = `i'(1)26 {
  display "-------------------------------------------------"
  display "Variables being compared are seq`i' and seq`++j'"
  gen var`i'_`j' = 1 if seq`i' == seq`j'
  quietly sum var`i'_`j' if `i' != `j'
  assert `r(sum)' < 6
  drop var`i'*
Solution/Explanation:  In the first iteration of the outer -forvalues- loop, i is set to one and j is initially set to one, but as soon as the looping begins, the j index is incremented by one in the -display- statement.  The ++j syntax tells Stata to increment the index before evaluating the value thus changing the value from one to two.  Now when the comparison variable is generated, the macros evaluate to i=1 and j=2, thus creating var1_2 for the already-existing variables, seq1 and seq2.  After the first iteration of the inner loop is completed, the j index is again incremented from two to three -- the outer index is still one -- thus generating another variable, var1_3, comparing seq1 to seq3.  After the inner loop completes -- generation of 26 variables later -- the outer loop (index i) is incremented from one to two and the inner loop (index j) is reset to start at two but as soon as the looping begins, this value is incremented from two to three by way of the ++j macro call.  This results in the generation of var2_3, var2_4, and so on up thru the 27th variable, var2_27.  This tandem looping continues up thru the 26th variable when only one comparator variable is created:  var26_27

This was a novel programming exercise for me for two reasons:  (1) I haven't had a lot of experience with -forvalues- loops in Stata (none with nesting them!); and (2) the incrementation of a macro value when calling the macro is pretty damn powerful (++j) and a technique I plan to add to my Stata toolbox.

No comments:

Post a Comment