dsread

Version 2.7, 2020-05-22

dsread: a simple command-line utility for working with SAS® data sets (SAS7BDAT files), as produced by the SAS System, the World Programming System, StatTransfer and others.

dsread lets you:

  • list the summary details of SAS data sets
  • list the contents (the variable names and other attributes*) of data sets
  • convert a data set to CSV format for import into Excel, Python, R, KNIME and other software
  • convert a data set to SQL source format for import into MySql and other database systems

Version 2.7 adds:

  • support for a variety of international character encodings, including UTF-8 Unicode
  • variable labels are now used as per-column comments in the SQL output
  • improved progress reporting when converting large data sets

dsread works on Windows Vista, 7, 8, 8.1 and 10, and also works under Wine on Linux machines.

The trial version of dsread is free to download; please register to get full functionality (just US$75 - see below for details).

See Also - DsView

Our latest product is DsView, a SAS data set viewer for Windows - try it here.

See Also - DsShell

dsread is very useful when you need a command-line tool; install DsShell, a Windows shell extension for SAS7BDAT files, for working with SAS data sets interactively, including previews in Windows Explorer.

Download

Download the zip file (284Kb) and save dsread.exe anywhere in your PATH - C:\Windows will do. You can store dsread.exe in a new folder (perhaps C:\dsread, or C:\Users\MyUsername\dsread) but you'll then need to add that folder to your PATH - see here for further information on how to do that.

Installation of dsread implies that you have read and accepted the End User License Agreement (EULA). The EULA for dsread is available both here and in the zip file.

Uninstallation Instructions

dsread is never 'installed' in the normal way, as it's just an executable file stored on your computer. To remove dsread from your computer, simply delete the dsread.exe file from your Windows directory (usually C:\Windows), or any other directory where you saved it. If necessary, you can use you computer's Search function to find all copies of dsread.exe and delete them.

Registration

This version of dsread is a free trial which you can download to make sure that it does everything that you want. If you decide to carry on using dsread, you must buy a registration code. Registration costs US$75 for a lifetime license which will work with all future versions of dsread at no extra cost; please enquire for multi-user and academic discounts.

To buy a registration code, click this PayPal link to send your payment, and we will email you your registration code within one business day.

When you receive your registration code, please follow the instructions here to store the code where dsread can find it.

Some functions of dsread are restricted until you get your registration code - the SQL output will only convert the first 50 observations of a data set, and the lossless output and performance/progress monitoring are disabled completely. Conversion to CSV is fully functional in the trial version, but only for data sets created in 32-bit Windows format; conversion of data sets from other platforms is restricted to the first 50 observations only.

Your dsread registration code will also work with DsShell at no extra cost.

Usage

To use dsread, start a Command Prompt and enter:

dsread [flags] [filename]

For example, to get a summary of all data sets in the current directory whose names start with 'p':

sashelp> dsread p* Name Vars Obs Label ----------------------------------------------------------------- PRDSAL2 11 23040 PRDSAL3 11 11520 PRDSALE 10 1440

To see the content (ie variable structure) of one of the data sets:

sashelp> dsread /c prdsale Contents of data set PRDSALE Name Type Len Format Label ----------------------------------------------------------------- ACTUAL Num 8 DOLLAR12.2 Actual Sales PREDICT Num 8 DOLLAR12.2 Predicted Sales COUNTRY Char 10 $CHAR10. Country REGION Char 10 $CHAR10. Region DIVISION Char 10 $CHAR10. Division PRODTYPE Char 10 $CHAR10. Product type PRODUCT Char 10 $CHAR10. Product QUARTER Num 8 8. Quarter YEAR Num 8 4. Year MONTH Num 8 MONNAME3. Month

To convert the data set to CSV:

sashelp> dsread /v prdsale ACTUAL,PREDICT,COUNTRY,REGION,DIVISION,PRODTYPE,PRODUCT,QUARTER,YEAR,MONTH 925,850,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12054 999,297,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12085 608,846,CANADA,EAST,EDUCATION,FURNITURE,SOFA,1,1993,12113 642,533,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12144 656,646,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12174 948,486,CANADA,EAST,EDUCATION,FURNITURE,SOFA,2,1993,12205 612,717,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12235 114,564,CANADA,EAST,EDUCATION,FURNITURE,SOFA,3,1993,12266 ...etc...

You'll probably want to redirect the output of a 'dsread /v' into another file, like this:

dsread /v prdsale > prdsale.csv

or you can specify the output file using the /o option, like this (note there is no space after the '/o'):

dsread /v /oprdsale.csv prdsale

You can combine the /c and /v flags to get the contents listing in CSV format:

sashelp> dsread /c /v prdsale DATASET,VARNUM,NAME,TYPE,LENGTH,FORMAT,LABEL PRDSALE,1,ACTUAL,Num,8,DOLLAR12.2,Actual Sales PRDSALE,2,PREDICT,Num,8,DOLLAR12.2,Predicted Sales PRDSALE,3,COUNTRY,Char,10,$CHAR10.,Country PRDSALE,4,REGION,Char,10,$CHAR10.,Region PRDSALE,5,DIVISION,Char,10,$CHAR10.,Division PRDSALE,6,PRODTYPE,Char,10,$CHAR10.,Product type PRDSALE,7,PRODUCT,Char,10,$CHAR10.,Product PRDSALE,8,QUARTER,Num,8,8.,Quarter PRDSALE,9,YEAR,Num,8,4.,Year PRDSALE,10,MONTH,Num,8,MONNAME3.,Month

The output is transcoded to UTF-8 so that any international characters can be correctly represented:

sashelp> dsread /v utf8 sex,name,temp 男,张三,36.6°C 女,李四,36.5°C sashelp> dsread /v /n5 greek ID,Index,Old,Region,Name,Education 1,1,1.234,Πελοποννήσου,Ιδιωτικοί Υπάλληλοι,Στοιχειώδης Εκπαίδευση 0,1,1.234,Κεντρικής Μακεδονίας,Ελεύθεροι Επαγγελματίες (Ανώτερο Status),Μέση Εκπαίδευση 1,1,1.234,Κεντρικής Μακεδονίας,Τραπεζικοί,Άγνωστο 0,1,1.234,Κρήτης,Λοιποί Ελεύθεροι επαγγελματίες,Άγνωστο 0,1,1.234,Στερεάς Ελλάδας,Ιδιωτικοί Υπάλληλοι,Άγνωστο

Most software will correctly recognise that the output file is UTF-8 encoded, but if necessary you can tell dsread to include a BOM (byte order marker) to the output by specifying the /b option.

You can add the /t flag to /v or /c to get tab-delimited output:

sashelp> dsread /v /t prdsale ACTUAL PREDICT COUNTRY REGION DIVISION PRODTYPE PRODUCT QUARTER YEAR MONTH 925 850 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12054 999 297 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12085 608 846 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12113 642 533 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12144 656 646 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12174 948 486 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12205 612 717 CANADA EAST EDUCATION FURNITURE SOFA 3 1993 12235 114 564 CANADA EAST EDUCATION FURNITURE SOFA 3 1993 12266 ...etc...

The /n flag limits which observations are output, either the first n observations:

sashelp> dsread /v /n5 prdsale ACTUAL PREDICT COUNTRY REGION DIVISION PRODTYPE PRODUCT QUARTER YEAR MONTH 925 850 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12054 999 297 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12085 608 846 CANADA EAST EDUCATION FURNITURE SOFA 1 1993 12113 642 533 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12144 656 646 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12174

...or n observations starting from observation m:

sashelp> dsread /v /n4-3 prdsale ACTUAL PREDICT COUNTRY REGION DIVISION PRODTYPE PRODUCT QUARTER YEAR MONTH 642 533 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12144 656 646 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12174 948 486 CANADA EAST EDUCATION FURNITURE SOFA 2 1993 12205

Use /m to see extended metadata for each data set:

sashelp> dsread /m zip_c Detailed metadata for ZIP_C ---------------------------------------- Num vars: 16 Num obs: 41988 Compression: Char Page size: 12288 Num pages: 381 Obs length: 190 Creator: DATASTEP Platform: WIN_PRO SAS Version: 9.0101M3 Encoding: WLATIN1 (15872) Requirements: 0 Req. Vector: 18-1f-10-11-22-22-00-32-22-01-02-32-04 Endianness: Little endian Bitness: 32 Created: 2010-03-01 16:56:04 (GMT+0:00) Modified: 2010-03-01 16:56:04 (GMT+0:00) File size: 4.47 Mb Raw file size: 4682752 Metadata: 1:2 - 1:23 Data: 1:24 - 381:8 First var desc: 1:8

Use /q to get SQL source output (/qc gives just the CREATE TABLE statement, /qi gives just the INSERT statements):

sashelp> dsread /q prdsale CREATE TABLE `prdsale` ( `ACTUAL` DOUBLE COMMENT 'Actual Sales', `PREDICT` DOUBLE COMMENT 'Predicted Sales', `COUNTRY` VARCHAR(10) COMMENT 'Country', `REGION` VARCHAR(10) COMMENT 'Region', `DIVISION` VARCHAR(10) COMMENT 'Division', `PRODTYPE` VARCHAR(10) COMMENT 'Product type', `PRODUCT` VARCHAR(10) COMMENT 'Product', `QUARTER` DOUBLE COMMENT 'Quarter', `YEAR` DOUBLE COMMENT 'Year', `MONTH` DOUBLE COMMENT 'Month' ) COMMENT=''; INSERT INTO `prdsale` VALUES ('925', '850',...,'1993', '12054'); INSERT INTO `prdsale` VALUES ('999', '297',...,'1993', '12085'); INSERT INTO `prdsale` VALUES ('608', '846',...,'1993', '12113'); INSERT INTO `prdsale` VALUES ('642', '533',...,'1993', '12144'); INSERT INTO `prdsale` VALUES ('656', '646',...,'1993', '12174'); INSERT INTO `prdsale` VALUES ('948', '486',...,'1993', '12205'); INSERT INTO `prdsale` VALUES ('612', '717',...,'1993', '12235'); ...etc...

Converting the IEEE floating-point numeric values in the SAS7BDAT file to their decimal representation may cause a loss of precision. To get a lossless representation of the data, use the /l flag:

sashelp> dsread /v /l prdsale ACTUAL,PREDICT,COUNTRY,... 0x0000000000e88c40,0x0000000000908a40,CANADA... 0x0000000000388f40,0x0000000000907240,CANADA... 0x0000000000008340,0x0000000000708a40,CANADA... 0x0000000000108440,0x0000000000a88040,CANADA... 0x0000000000808440,0x0000000000308440,CANADA... 0x0000000000a08d40,0x0000000000607e40,CANADA... ...etc...

The numerics are output as eight hexadecimal bytes (16 digits) giving the internal floating-point representation, which can then be used to reconstruct the exact same value in the receiving software. Use /L to get the bytes in big-endian order

Use /? for help:

sashelp> dsread /? dsread: A reader for files in SAS7BDAT format. Version 2.7 (2020-05-22) Usage: dsread [[/option]...] <filename> <filename>... /?: Show this usage message and exit. /h: Show this usage message and exit. /c: List the variables in the dataset and their attributes. /m: Write dataset metadata to standard output. /v: Write a CSV representation of the dataset to standard output. /t: Tab-delimited - data output will be as CSV but tab-delimited. /o: Send output to the named file (eg /omydata.csv). /b: Starts the output with a UTF-8 byte order marker (BOM). /n: Limits the observations to be output. /n<numObs>: Output only the first <numObs> observations. /n<firstObs>-<obsCount>: Output <obsCount> observations starting from <firstObs>. /u: Check for newer version (requires internet access). * /q: Write an SQL representation of the dataset to standard output. * /qc: Writes only the SQL CREATE TABLE statement. * /qi: Writes only the SQL INSERT statements. * /l: Lossless data output - writes numerics as if using HEX16. format. * /L: As above but bytes will be output in big-endian order. * /p: Output progress messages to standard error. Options marked * will only work for registered users. You can use - instead of / when specifying options. With no options specified, list summary details of datasets. http://www.oview.co.uk/dsread, 2010-2020. Registration file (C:\Users\Chris\Documents\dsread_registration.txt) not found. Unregistered - options marked * are unavailable.

Use /u to quickly check for updates online:

sashelp> dsread /u Contacting www.oview.co.uk for latest version details... You already have the latest version of dsread; no updates available.

The CSV output conforms to the RFC 4180 spec. Both Excel and Open Office's Calc read the output correctly and automatically. Note that CSV output from PROC EXPORT and the Universal Viewer doesn't conform to RFC 4180; values containing special characters like new-lines and double-quotes will not be quoted correctly.

Limitations

Despite thorough testing, Oceanview Consultancy Ltd cannot guarantee that the output of this software will be accurate in all cases. You use it at your own risk.

It will work on the vast majority of SAS data sets in the SAS7BDAT format, whether uncompressed or using CHAR or BINARY compression. Using dsread on some data sets, for example data sets that have been edited using FSEDIT or similar, might lead to strange effects like deleted observations being output.

Note that dsread will happily work on data sets that have a READ password. The password protection in SAS7BDAT files does NOT encrypt the data and dsread doesn't even check to see whether a password has been set. You should not rely on a READ password to protect your data - use the ENCRYPT option for that. See SAS's rather cryptic warning about this here.

Before attempting to read a data set, dsread checks its 'data requirements' information. If this isn't recognised, dsread will stop with an 'Unknown data requirements' error. If you have a SAS7BDAT file that triggers this error, please let me know () so that dsread can be tested against it.

Formats: dsread will correctly read the names of formats applied to variables in SAS7BDAT files and will show the format names in the contents list. However, when outputting data, many formats are ignored and only the underlying numeric and character values are output. The exceptions are w.d, BEST, Z, DATE, DATETIME, YYMMDDx, MMDDYY, B8601DA, E8601DA and TIME, for which values will be converted appropriately, though note that the length specified for some formats will be ignored - for example, any DATEn. format will be output in the same format, whether it's DATE9., DATE12. etc. The numeric formats will correctly honour the w.d lengths as specified.

Change History

Version 2.7:

  • Added international character set support. All output data is now transcoded to UTF-8.
  • SQL output now uses the data set's variable labels as column comments in the CREATE TABLE statement.
  • Improved progress reporting to give estimated time remaining etc.

Version 2.6:

  • Added handling for data sets with amended metadata.
  • Fixed bugs in SQL output.
  • Added options to select only CREATE TABLE or only INSERT statements, instead of always generating both.
  • Added options to select a subset of observations to be output, based on observation number.
  • Added support for a number of new formats - w.d, BEST, Z, B8601DA, E8601DA.
  • Improved formatting of all numeric formats, honouring the specified width and decimal precision.

Version 2.5:

  • Greatly improved error detection and reporting.
  • Added the /m option to report data set metadata.

Version 2.4:

  • Support for YYMMDDx formats added.

Version 2.3:

  • Support for BINARY compression added.

Version 2.2:

  • Fixed bug that caused a crash when opening a data set using BINARY compression. The data from such data sets cannot be read (though the contents listing can still be produced) but the error will be reported gracefully. (Note added 2013-22-11: Versions 2.3 and later can now read BINARY compressed files).
  • Added support for a new platform - data sets created in little-endian format on Linux/Intel machines (many thanks to the user who reported this new platform).
  • Improved the messages related to the registration file - the expected location of the file will now be reported in the '/?' output to assist in diagnosing registration problems.

Version 2.1:

  • Bug fix over 2.0; wide data sets now handled correctly.

Version 2.0:

  • Datasets from most 32-bit and 64-bit operating systems now supported, including Windows, Linux and Solaris.

Version 1.8:

  • Added the WIN and WIN_VSHO platform identifiers.

Version 1.7:

  • Added the /o flag to send output directly to a named output file. This avoids the need to redirect standard output, which may be helpful in certain situations.

Version 1.6:

  • Added several new supported Windows platforms

Version 1.5:

  • Added WIN_ASRV as a supported platform
  • Improved error message when an unsupported platform is encountered

Version 1.4:

  • Improved memory usage for large data sets
  • Added DATETIME format support
  • Added SQL output format
  • Added registration code handling
  • Upgraded to Visual Studio 2010 compiler (dsread will no longer work on Windows 2000 or earlier)

Version 1.3:

  • Added compression support
  • Added performance/progress reporting (/p option)
  • Improved performance on large data sets
  • Improved formatting of numeric output

*: It's one of SAS's many quirks that the structure of a data set is referred to as its 'contents', which can cause understandable confusion if you think of the actual data in the data set to be its contents.


Comments