Wednesday, February 5, 2014

Clever Combining CSV Files to Cure KickStarter Conundrums

I recently ran a successful KickStarter for a Bubblegum Crisis Ultimate Edition (you can still sign up here), and in the aftermath ran into an annoying problem.

KickStarter lets you run a Backer Survey after the project is funded, and that's how backers tell you things like their names, addresses and so on. A typical KickStarter will have many different "support levels" providing different benefits.

Annoyingly, KickStarter forces you to design a survey format for each individual support level, with no way of cloning surveys. But what really got me steamed was the discovery that when you export the survey results, you get a CSV file for each survey, and the format of that CSV file depends on the items in each individual survey.

That makes it a royal pain in the ass to import the data into your own database -- I mean, I had 38 CSV files, each with its own ordering of fields. Plus given that people take their time filling out the surveys (2 months later and they're still trickling in), I was going to be doing the import multiple times.

To solve this, I wrote a quick little python script, combine-csv.py, that reads in an entire directory of CSV files, extracts all the unique field names from them, and then outputs a single file containing all the rows of data, with a column for each unique field. In cases where a row does not have an entry for a particular field, it gets a blank entry. Also, if a file contains multiple columns that have the same name, the data in them is combined into a single column, separated by the | delimiter.

Added: optional count field name; if present, each unique line is emitted only once, with the last field being a count of the number of occurrences. If not present, the source file name is appended.

You can find it here: combine-csv.py.zip

You use it like this:

python combine-csv.py [source directory] [destination csv file] {optional count field name}

Enjoy!

Update: I also whipped up a standalone AllCSV app that does the same thing; available for both Mac and Windows. It does not currently do the occurrence counting.