Hi, I've implemented a tool using lucene-5.2.0 to index my CSV files. The tool is reading data from CSV files(residing on disk) and creating indexes on local disk. It is able to process 3.5 MBps data. There are overall 46 fields being added in one document. They are only of three data types 1. Integer, 2. Long, 3. String. All these fields are part of one CSV record and they are parsed using custom CSV parser which is faster than any split method of string.
I've configured the following parameters to create indexWriter 1. setOpenMode(OpenMode.CREATE) 2. setCommitOnClose(true) 3. setRAMBufferSizeMB(512) // Tried 256, 312 as well but performance is almost same. I've read over several blogs that lucene works way faster than these figures. So, I thought there are some bottlenecks in my code and profiled it using jvisualvm. The application is spending most of the time in DefaultIndexChain.processField i.e. 53% of total time. Following is the split of CPU usage in this application: 1. reading data from disk is taking 5% of total duration 2. adding document is taking 93% of total duration. - postUpdate -> 12.8% - doAfterDocument -> 20.6% - updateDocument -> 59.8% - finishDocument -> 1.7% - finishStoreFields -> 4.8% - processFields -> 53.1% I'm also attaching the screen shot of call graph generated by jvisualvm. I've taken care of following points: 1. create only one instance of indexWriter 2. create only one instance of document and reuse it through out the life time of application 3. There will be no update in the documents hence only addDocument is invoked. Note: After going through the code I found out that addDocument is internally calling updateDocument only. Is there any way by which we can avoid calling updateDocument and only use addDocument API? 4. Using setValue APIs to set the pre created fields and reusing these fields to create indexes. Any tip to improve the performance will be immensely appreciated. Regards, Sandeep
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org