I kept tickling ExtendedBufferedReader and I have some interesting results.
First I tried to simplify it by extending java.io.LineNumberReader instead of BufferedReader. The performance decreased by 20%, probably because the class is synchronized internally.
But wait, isn't BufferedReader also synchronized? I copied the code of BufferedReader and removed the synchronized blocks. Now the time to parse the file is down to 2652 ms, 28% faster than previously!
Of course the code of BufferedReader can't be copied from the JDK due to the license mismatch, so I took the version from Harmony. On my test it is about 4% faster than the JDK counterpart, and the parsing time is now around 2553 ms.
Now Commons CSV can start claiming being the fastest CSV parser around :) Emmanuel Bourg Le 12/03/2012 11:31, Emmanuel Bourg a écrit :
I have identified the performance killer, it's the ExtendedBufferedReader. It implements a complex logic to fetch one character ahead, but this extra character is rarely used. I have implemented a simpler look ahead using mark/reset as suggested by Bob Smith in CSV-42 and the performance improved by 30%. Now the parsing is down to 3406 ms, and that's almost without touching the parser yet. Emmanuel Bourg Le 11/03/2012 15:05, Emmanuel Bourg a écrit :Hi, I compared the performance of Commons CSV with the other CSV parsers available. I took the world cities file from Maxmind as a test file [1], it's a big file of 130M with 2.8 million records. Here are the results obtained on a Core 2 Duo E8400 after several iterations to let the JIT compiler kick in: Direct read 750 ms Java CSV 3328 ms Super CSV 3562 ms (+7%) OpenCSV 3609 ms (+8.4%) GenJava CSV 3844 ms (+15.5%) Commons CSV 4656 ms (+39.9%) Skife CSV 4813 ms (+44.6%) I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use them. I haven't analyzed why Commons CSV is slower yet, but it seems there is room for improvements. The memory usage will have to be compared too, I'm looking for a way to measure it. Emmanuel Bourg [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
smime.p7s
Description: S/MIME Cryptographic Signature