download zip file
---
register
---
Try Elvis, the SAS Log Analyzer
Elvis
---
Related: DsShell, a Windows shell extension for SAS7BDAT files

---

---

dsread

Version 1.8, 2011-07-20

dsread is a simple command-line utility for working with datasets in the SAS7BDAT file format, as produced by the SAS System, the World Programming System, StatTransfer and others.

Version 1.8 adds two more platforms to the supported list - WIN and WIN_VSHO.

dsread lets you:

  • list the summary details of datasets
  • list the contents (the variable names and other attributes*) of datasets
  • convert a dataset to CSV format for import into Excel, R, KNIME and other software
  • convert a dataset to SQL source format for import into mySql and other database systems

See Also - DsShell

dsread is very useful when you need a command-line tool; for a more user-friendly experience, please try our latest product - DsShell, a Windows shell extension for SAS7BDAT files.

Download

Download the zip file and save dsread.exe anywhere in your path - C:\WINDOWS will do. To uninstall, just delete the file - dsread doesn't leave any other files or registry entries on your machine.

Registration

This version of dsread is a free trial which you can download to make sure that it does everything that you want. If you decide to carry on using dsread, you should buy a registration code via RegNow by clicking here. Registration costs US$75 for a lifetime license which will work with all future versions of dsread at no extra cost; please enquire for multi-user and academic discounts.

Some functions of dsread are restricted until you get your registration code - the SQL output will only convert the first 100 observations of a dataset, and the lossless output and performance/progress monitoring are disabled completely.

Usage

To use dsread, start a Command Prompt and enter:

dsread [flags] [filename]

For example, to get a summary of all datasets in the current directory whose names start with 'p':

sashelp> dsread p* Name Vars Obs Label ----------------------------------------------------------------- PRDSAL2 11 23040 PRDSAL3 11 11520 PRDSALE 10 1440

To see the content (ie variable structure) of one of the datasets:

sashelp> dsread /c prdsale Contents of dataset PRDSALE Name Type Len Format Label ----------------------------------------------------------------- ACTUAL Num 8 DOLLAR12.2 Actual Sales PREDICT Num 8 DOLLAR12.2 Predicted Sales COUNTRY Char 10 $CHAR10. Country REGION Char 10 $CHAR10. Region DIVISION Char 10 $CHAR10. Division PRODTYPE Char 10 $CHAR10. Product type PRODUCT Char 10 $CHAR10. Product QUARTER Num 8 8. Quarter YEAR Num 8 4. Year MONTH Num 8 MONNAME3. Month

To convert the dataset to CSV:

sashelp> dsread /v prdsale ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODTYPE,PRODUCT,QUARTER,YEAR,MONTH 925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 999,297,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12085 608,846,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12113 642,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12144 656,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12174 948,486,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12205 612,717,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12235 114,564,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12266 ...etc...

You'll probably want to redirect the output of a 'dsread /v' into another file, like this:

dsread /v prdsale > prdsale.csv

or you can specify the output file using the /o option, like this (note there is no space after the '/o'):

dsread /v /oprdsale.csv prdsale

You can combine the /c and /v flags to get the contents listing in CSV format:

sashelp> dsread /c /v prdsale DATASET,VARNUM,NAME,TYPE,LENGTH,FORMAT,LABEL PRDSALE,1,ACTUAL,Num,8,DOLLAR12.2,Actual Sales PRDSALE,2,PREDICT,Num,8,DOLLAR12.2,Predicted Sales PRDSALE,3,COUNTRY,Char,10,$CHAR10.,Country PRDSALE,4,REGION,Char,10,$CHAR10.,Region PRDSALE,5,DIVISION,Char,10,$CHAR10.,Division PRDSALE,6,PRODTYPE,Char,10,$CHAR10.,Product type PRDSALE,7,PRODUCT,Char,10,$CHAR10.,Product PRDSALE,8,QUARTER,Num,8,8.,Quarter PRDSALE,9,YEAR,Num,8,4.,Year PRDSALE,10,MONTH,Num,8,MONNAME3.,Month

Use /t to get tab-delimited output:

sashelp> dsread /v /t prdsale ACTUAL PREDICT COUNTRY REGION DIVISION PRODTYPE PRODUCT QUARTER YEAR MONTH 925 850 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12054 999 297 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12085 608 846 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12113 642 533 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12144 656 646 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12174 948 486 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12205 612 717 CANADA EAST EDUCATION FURNITURE SOFA 3 1993 12235 114 564 CANADA EAST EDUCATION FURNITURE SOFA 3 1993 12266 ...etc...

Use /q to get SQL source output:

sashelp> dsread /q prdsale CREATE TABLE `prdsale` ( `ACTUAL` DOUBLE, `PREDICT` DOUBLE, `COUNTRY` VARCHAR(10), `REGION` VARCHAR(10), `DIVISION` VARCHAR(10), `PRODTYPE` VARCHAR(10), `PRODUCT` VARCHAR(10), `QUARTER` DOUBLE, `YEAR` DOUBLE, `MONTH` DOUBLE ) COMMENT=''; INSERT INTO `prdsale` VALUES ('925', '850', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '1', '1993', '12054'); INSERT INTO `prdsale` VALUES ('999', '297', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '1', '1993', '12085'); INSERT INTO `prdsale` VALUES ('608', '846', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '1', '1993', '12113'); INSERT INTO `prdsale` VALUES ('642', '533', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '2', '1993', '12144'); INSERT INTO `prdsale` VALUES ('656', '646', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '2', '1993', '12174'); INSERT INTO `prdsale` VALUES ('948', '486', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '2', '1993', '12205'); INSERT INTO `prdsale` VALUES ('612', '717', 'CANADA', 'EAST', 'EDUCATION', 'FURNITURE', 'SOFA', '3', '1993', '12235'); ...etc...

Converting the IEEE floating-point numeric values in the SAS7BDAT file to decimal frequently causes a loss of precision. To get a lossless representation of the data, use the /l flag:

sashelp> dsread /v /l prdsale ACTUAL,PREDICT,COUNTRY,... 0x0000000000e88c40,0x0000000000908a40,CANADA... 0x0000000000388f40,0x0000000000907240,CANADA... 0x0000000000008340,0x0000000000708a40,CANADA... 0x0000000000108440,0x0000000000a88040,CANADA... 0x0000000000808440,0x0000000000308440,CANADA... 0x0000000000a08d40,0x0000000000607e40,CANADA... ...etc...

The numerics are output as eight hexadecimal bytes (16 digits) giving the internal floating-point representation, which can then be used to reconstruct the exact same value in the receiving software. Use /L to get the bytes in big-endian order

Use the '/?' flag for help:

sashelp> dsread /? dsread: A reader for files in SAS7BDAT format. Version 1.8 (2011-07-20) Usage: dsread [[/option]...] ... /?: Show this usage message and exit. /h: Show this usage message and exit. /c: List the variables in the dataset and their attributes. /v: Write a CSV representation of the dataset to standard output. /t: Tab-delimited - data output will be as CSV but tab-delimited. /o: Send output to the named file (eg /omydata.csv). * /q: Write an SQL representation of the dataset to standard output. * /l: Lossless data output - writes numerics as if using HEX16. format. * /L: As above but bytes will be output in big-endian order. * /p: Output performance/progress messages to standard error. Options marked * will only work for registered users. You can use - instead of / when specifying options. With no options specified, list summary details of datasets. http://www.oview.co.uk/dsread, 2010-2011. Registration file not found. Unregistered - options marked * are unavailable.

The CSV output conforms to the RFC 4180 spec. Both Excel and Open Office's Calc read the output correctly and automatically. Note that CSV output from PROC EXPORT and the Universal Viewer doesn't conform to RFC 4180; values containing special characters like new-lines and double-quotes will not be quoted correctly.

Limitations

This software should be considered experimental and is not guaranteed to be accurate. You use it at your own risk.

It will only work on Windows-format SAS7BDAT files for now, either uncompressed or using 'CHAR' compression. It should only be used on 'clean' datasets, by which I mean datasets that are the immediate output of a data step or PROC. Using dsread on other datasets, for example datasets that have been edited using FSEDIT or similar, might lead to strange effects like deleted observations being output.

Another strange effect is that dsread will happily work on datasets that have a READ password. The password protection in SAS7BDAT files does NOT encrypt the data and dsread doesn't even check to see whether a password has been set. You should not rely on a READ password to protect your data from prying eyes - use the ENCRYPT option for that. See SAS's rather cryptic warning about this here.

To avoid program crashes when run against non-Windows datasets, dsread will reject any SAS7BDAT file whose 'platform' identifier is not one of:

  • WIN
  • WIN_PRO
  • WIN_NT
  • WIN_NTSV
  • WIN_SRV
  • WIN_ASRV
  • XP_PRO
  • XP_HOME
  • W32_VSHO
  • W32_VSPRO
  • NET_ASRV
  • NET_DSRV
  • NET_SRV

If you have a Windows SAS7BDAT file with a different platform identifier, please let me know () so that dsread can be tested against it.

Formats: dsread will correctly read the names of formats applied to variables in SAS7BDAT files and list them in the contents list. However, when outputting data, most formats are ignored and only the underlying numeric and character values are output. The exceptions are DATE, DATETIME, YYMMDD and TIME, for which values will be converted appropriately, though note that the length specified for the format will be ignored - that is, any DATEn. format will be output in the same format, whether it's DATE9., DATE12. etc.

Compression: dsread can now read datasets compressed with CHAR compression. This is a simple run-length-encoding compression and is the default type when you create a dataset with either the COMPRESS=YES or COMPRESS=CHAR options. The only other documented compression scheme is BINARY, which is only used when COMPRESS=BINARY is specified. Datasets with BINARY compression will still be rejected by dsread.

Change History

Version 1.8:

  • Added the WIN and WIN_VSHO platform identifiers.

Version 1.7:

  • Added the /o flag to send output directly to a name output file. This avoids the need to redirect standard output, which may be helpful in certain situations.

Version 1.6:

  • Added several new supported Windows platforms

Version 1.5:

  • Added WIN_ASRV as a supported platform
  • Improved error message when an unsupported platform is encountered

Version 1.4:

  • Improved memory usage for large datasets
  • Added DATETIME format support
  • Added SQL output format
  • Added registration code handling
  • Upgraded to Visual Studio 2010 compiler (dsread will no longer work on Windows 2000 or earlier)

Version 1.3:

  • Added compression support
  • Added performance/progress reporting (/p option)
  • Improved performance on large datasets
  • Improved formatting of numeric output


*: It's one of SAS's many quirks that the structure of a dataset is referred to as its 'contents', which can cause understandable confusion if you think of the actual data in the dataset to be its contents.


Comments