I have identified the performance killer, it's the ExtendedBufferedReader. It implements a complex logic to fetch one character ahead, but this extra character is rarely used. I have implemented a simpler look ahead using mark/reset as suggested by Bob Smith in CSV-42 and the performance improved by 30%.

Now the parsing is down to 3406 ms, and that's almost without touching the parser yet.

Emmanuel Bourg


Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
Hi,

I compared the performance of Commons CSV with the other CSV parsers
available. I took the world cities file from Maxmind as a test file [1],
it's a big file of 130M with 2.8 million records.

Here are the results obtained on a Core 2 Duo E8400 after several
iterations to let the JIT compiler kick in:

Direct read 750 ms
Java CSV 3328 ms
Super CSV 3562 ms (+7%)
OpenCSV 3609 ms (+8.4%)
GenJava CSV 3844 ms (+15.5%)
Commons CSV 4656 ms (+39.9%)
Skife CSV 4813 ms (+44.6%)

I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
them.

I haven't analyzed why Commons CSV is slower yet, but it seems there is
room for improvements. The memory usage will have to be compared too,
I'm looking for a way to measure it.


Emmanuel Bourg

[1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to