Comma Separate Values (CSV) is an excellent format to use for many types of data—a layperson can open the files in any of dozens of programs (Excel, Google Sheets, Gnumeric, LibreOffice, Numbers, OpenOffice, etc.), they’re easy to produce, they’re compact, and they’re easy to parse. But there are right and wrong ways to produce CSV.
The following is based on the rules provided at clean-sheet.org.
- The first row of the spreadsheet must be column headers, with one header for each column that has data.
- All data must be in rows and columns, as a lookup table. Every row stores a different record, every cell corresponding to its column.
- Do not format fields. Omit dollar signs, commas, footnote symbols, etc. For example, represent dollar values as
- Adhere to ISO standards for fields. Refer to countries by their ISO 3166 code, dates in the ISO 8601 format, etc.
- If no value is available for a field, leave it blank. Do not write
N/Aor provide an explanatory note in the field.
- Do not provide additional data in the same file, such as a key, aggregate statistics, a description of methodologies, etc. These belong elsewhere.
A Good Spreadsheet
A Bad Spreadsheet
|Members of Congress|
|Alma Adams||N. Carolina||12th||2014|
|Adams was appointed by the governor to complete Smith’s term.||NC is redistricting.|
|* Special election|
|Methodology: All data was gathered from Congress.gov on January 21, 2015, with the ...|