Java CSV parser library comparisons
June 22, 2011 12:14:15 Last update: June 22, 2011 12:15:12
Parsing a CVS file seemed to be a sore spot for software development. It's not simple enough that you can roll your own code in a couple of hours, and yet not deemed big enough for a full-fledged project. As a result, there are multiple libraries, each one with its own quirks. This is a summary of some simple tests I've done with five CSV parsers.
Conclusion? I'll use the Skife CSV parser if I know there are no "new line" characters in my data. I'll use the Apache CSV parser if I know there are no backslashes in my data.
- Apache commons CSV parser:
- Does not escape backslash. The backslash is treated as literal if not proceeding a double quote, but then there's no way to have a backslash as the last character before the ending quote (even though it's a rare scenario).
- IOException thrown for unmatched quotes
- Does not escape backslash. The backslash is treated as literal if not proceeding a double quote, but then there's no way to have a backslash as the last character before the ending quote (even though it's a rare scenario).
- SuperCSV parser
- Can't handle escapes inside quotes
- Throws Exception for unmatched quotes
- Can't handle escapes inside quotes
- OstermillerUtils CSV parser
- Can't handle escapes inside quotes
- Space before quotation mark messes up parsing
- Does not handle new line inside quotes
- Can't handle escapes inside quotes
- OpenCSV CSV parser
- Mishandles spaces before and between items
- Silently ignores unmatched quotes
- Mishandles spaces before and between items
- Skife CSV parser
- Does not handle new line inside quotes
- Does not handle new line inside quotes
Conclusion? I'll use the Skife CSV parser if I know there are no "new line" characters in my data. I'll use the Apache CSV parser if I know there are no backslashes in my data.