Monday, June 2, 2014

SAS, Excel, and Leading Zeros

Working with Excel can sometimes be (usually is?) a less-than-pleasant experience, especially if you have to interface with a statistical software program (e.g. SAS, Stata).  This interfacing usually begins with moving the data from Excel to SAS and although seemingly simple and straightforward, there are small (and often overlooked) considerations that really annoy.  The loss of leading zeros from Excel to SAS is one.  

A Google search of some combination of "SAS", "Excel", and "leading zeros" returns several hits but as I clicked through them, there wasn't any one that I found 100% satisfactory.  One option was to change the numeric values in the Excel spreadsheet to text but even this didn't seem to retain the leading zeros (even in Excel!).  Another option was to use DDE but this required an INPUT statement and knowing the cell range --- two annoying intermediate steps I'd rather not deal with.  I think PROC IMPORT might also be an option if the data is saved as CSV but this is of marginal use if using an XLSX file.  One method, though, seemed to have some promise --- the Excel LIBNAME engine --- but when I first tried it, the leading zeros were not retained.  Turns out, though, this problem is easily rectified by adding a FORMAT statement to the DATA step.  Note that the variables are numeric but if character variables are sought, a PUT statement can be easily substituted for the FORMAT statement.  The code is below:

libname RawValue EXCEL "<Pathname Here>\<FileName Here>.xlsx";
data dsnout; 
  set RawValue."Sheet1$"n;
  format var1 var2 vark Z5.0;
run;
libname RawValue clear;

No comments:

Post a Comment