Java CSV parsing with OstermillerUtils 

Joined:
04/09/2007
Posts:
753

June 22, 2011 07:12:02    Last update: June 22, 2011 11:32:02
Demo code for CSV parsing with OstermillerUtils.
  1. Java code:
    import java.io.*;
    import java.util.List;
    import com.Ostermiller.util.ExcelCSVParser;
    
    public class CSVParseDemo {
        public static void main(String[] args) throws IOException {
    	if (args.length < 1) {
    	    System.out.println("Usage: java CSVParseDemo <csv_file>");
    	    return;
    	}
    
    	ExcelCSVParser parser = new ExcelCSVParser(new FileReader(args[0]));
    	String[] values = parser.getLine();
    	while (values != null) {
    	    printValues(parser.lastLineNumber(), values);
    	    values = parser.getLine();
    	}
        }
    
        private static void printValues(int lineNumber, String[] as) {
    	System.out.println("Line " + lineNumber + " has " + as.length + " values:");
    	for (String s: as) {
    	    System.out.println("\t|" + s + "|");
    	}
    	System.out.println();
        }
    }
    

  2. Test with a simple CSV file:
    psmith01,CLASS2B,Peter Smith 1,YEAR2,1,N,ADVANCED,STAFF,1,Y,Y
    smehta,CLASS3G,Smeeta Mehta,LOCAL,1,Y,STANDARD,PUPIL,2.1,N,Y
    

    The parser worked correctly:
    Line 1 has 11 values:
    	|psmith01|
    	|CLASS2B|
    	|Peter Smith 1|
    	|YEAR2|
    	|1|
    	|N|
    	|ADVANCED|
    	|STAFF|
    	|1|
    	|Y|
    	|Y|
    
    Line 2 has 11 values:
    	|smehta|
    	|CLASS3G|
    	|Smeeta Mehta|
    	|LOCAL|
    	|1|
    	|Y|
    	|STANDARD|
    	|PUPIL|
    	|2.1|
    	|N|
    	|Y|
    


  3. Test with a more complicated CSV file:
    "psmith01 abc", "CLASS2B            "    , " Peter, Smith 1", "\", YEAR2 \""
    " smehta ' \", \\, "   ,     "CLASS3G \\"
    " smehta ' \", \\, "   ,     "CLASS3G \"
    

    The parser messed up on all three lines:
    Line 1 has 6 values:
    	|psmith01 abc|
    	| "CLASS2B            "    |
    	| " Peter|
    	| Smith 1"|
    	| "\"|
    	| YEAR2 \""|
    
    Line 2 has 4 values:
    	| smehta ' \|
    	| \\|
    	| "   |
    	|     "CLASS3G \\"|
    
    Line 3 has 4 values:
    	| smehta ' \|
    	| \\|
    	| "   |
    	|     "CLASS3G \"|
    


  4. Putting a space before the first quote:
     "Smith, Jack","210-345-8888"
    

    It dismissed the quotes:
    Line 1 has 3 values:
    	| "Smith|
    	| Jack"|
    	|210-345-8888|
    


  5. Add a new line in item two:
    "One", "Two
    ", "Three"
    

    Result:
    Line 1 has 2 values:
    	|One|
    	| "Two|
    
    Line 2 has 1 values:
    	|, |
    

Share |
| Comment  | Tags