Introduction

Comma Separate Values (CSV) is an excellent format to use for many types of data—a layperson can open the files in any of dozens of programs (Excel, Google Sheets, Gnumeric, LibreOffice, Numbers, OpenOffice, etc.), they’re easy to produce, they’re compact, and they’re easy to parse. But there are right and wrong ways to produce CSV.

Guidelines

The following is based on the rules provided at clean-sheet.org.

  1. The first row of the spreadsheet must be column headers, with one header for each column that has data.
  2. All data must be in rows and columns, as a lookup table. Every row stores a different record, every cell corresponding to its column.
  3. Do not format fields. Omit dollar signs, commas, footnote symbols, etc. For example, represent dollar values as 100002.25, not $100,002.25.
  4. Adhere to ISO standards for fields. Refer to countries by their ISO 3166 code, dates in the ISO 8601 format, etc.
  5. If no value is available for a field, leave it blank. Do not write N/A or provide an explanatory note in the field.
  6. Do not provide additional data in the same file, such as a key, aggregate statistics, a description of methodologies, etc. These belong elsewhere.

Examples

A Good Spreadsheet

namestatedistrictyear_elected
Ralph AbrahamLA52015
Alma AdamsNC122014
Robert AderholtAL41997
Pete AguilarCA312015

A Bad Spreadsheet

Members of Congress
namestatedistrictyear_elected
Ralph AbrahamLouisiana5th2015*
Alma AdamsN. Carolina12th2014
Adams was appointed by the governor to complete Smith’s term.NC is redistricting.
Robert AderholtAlabamaN/A1997
Pete AguilarCalifornia31st2015
* Special election
Methodology: All data was gathered from Congress.gov on January 21, 2015, with the ...