Introduction
Comma Separate Values (CSV) is an excellent format to use for many types of data—a layperson can open the files in any of dozens of programs (Excel, Google Sheets, Gnumeric, LibreOffice, Numbers, OpenOffice, etc.), they’re easy to produce, they’re compact, and they’re easy to parse. But there are right and wrong ways to produce CSV.
Guidelines
The following is based on the rules provided at clean-sheet.org.
- The first row of the spreadsheet must be column headers, with one header for each column that has data.
- All data must be in rows and columns, as a lookup table. Every row stores a different record, every cell corresponding to its column.
- Do not format fields. Omit dollar signs, commas, footnote symbols, etc. For example, represent dollar values as
100002.25
, not$100,002.25
. - Adhere to ISO standards for fields. Refer to countries by their ISO 3166 code, dates in the ISO 8601 format, etc.
- If no value is available for a field, leave it blank. Do not write
N/A
or provide an explanatory note in the field. - Do not provide additional data in the same file, such as a key, aggregate statistics, a description of methodologies, etc. These belong elsewhere.
Examples
A Good Spreadsheet
name | state | district | year_elected |
---|---|---|---|
Ralph Abraham | LA | 5 | 2015 |
Alma Adams | NC | 12 | 2014 |
Robert Aderholt | AL | 4 | 1997 |
Pete Aguilar | CA | 31 | 2015 |
A Bad Spreadsheet
Members of Congress | |||
---|---|---|---|
name | state | district | year_elected |
Ralph Abraham | Louisiana | 5th | 2015* |
Alma Adams | N. Carolina | 12th | 2014 |
Adams was appointed by the governor to complete Smith’s term. | NC is redistricting. | ||
Robert Aderholt | Alabama | N/A | 1997 |
Pete Aguilar | California | 31st | 2015 |
* Special election | |||
Methodology: All data was gathered from Congress.gov on January 21, 2015, with the ... |