Java CSV parsing with Apache Commons CSV parser
June 21, 2011 15:41:51 Last update: June 22, 2011 11:33:36
Demo code for CSV parsing with Apache Commons CSV parser.
- Java code:
import java.io.*; import org.apache.commons.csv.CSVParser; import org.apache.commons.csv.CSVStrategy; public class CSVParseDemo { public static void main(String[] args) throws IOException { if (args.length < 1) { System.out.println("Usage: java CSVParseDemo <csv_file>"); return; } CSVParser parser = new CSVParser(new FileReader(args[0]), CSVStrategy.EXCEL_STRATEGY); String[] values = parser.getLine(); while (values != null) { printValues(parser.getLineNumber(), values); values = parser.getLine(); } } private static void printValues(int lineNumber, String[] as) { System.out.println("Line " + lineNumber + " has " + as.length + " values:"); for (String s: as) { System.out.println("\t|" + s + "|"); } System.out.println(); } }
- Test with a simple CSV file:
psmith01,CLASS2B,Peter Smith 1,YEAR2,1,N,ADVANCED,STAFF,1,Y,Y smehta,CLASS3G,Smeeta Mehta,LOCAL,1,Y,STANDARD,PUPIL,2.1,N,Y
Result:Line 1 has 11 values: |psmith01| |CLASS2B| |Peter Smith 1| |YEAR2| |1| |N| |ADVANCED| |STAFF| |1| |Y| |Y| Line 2 has 11 values: |smehta| |CLASS3G| |Smeeta Mehta| |LOCAL| |1| |Y| |STANDARD| |PUPIL| |2.1| |N| |Y|
The parser worked correctly.
- Test with a more complicated CSV file:
"psmith01 abc", "CLASS2B " , " Peter, Smith 1", "\", YEAR2 \"" " smehta ' \", \\, " , "CLASS3G \\" " smehta ' \", \\, " , "CLASS3G \"
Result:Line 1 has 4 values: |psmith01 abc| |CLASS2B | | Peter, Smith 1| |", YEAR2 "| Line 2 has 2 values: | smehta ' ", \\, | |CLASS3G \\| Exception in thread "main" java.io.IOException: (startline 2)eof reached before encapsulated token finished at org.apache.commons.csv.CSVParser.encapsulatedTokenLexer(CSVParser.java:510) at org.apache.commons.csv.CSVParser.nextToken(CSVParser.java:365) at org.apache.commons.csv.CSVParser.getLine(CSVParser.java:239) at CSVParseDemo.main(CSVParseDemo.java:16)
The third line is invalid input, but throwing a JavaIOExceptionis a bit grave. Also, the parser is not able to escape a backslash.
- Add a new line in item two:
"One", "Two ", "Three"
Result:Line 2 has 3 values: |One| |Two | |Three|