One addition: If you are indexing millions of numeric fields, you should also try to reuse NumericField and Document instances (as described in JavaDocs). NumericField creates internally a NumericTokenStream and lots of small objects (attributes), so GC cost may be high. This is just another idea.
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Wednesday, April 14, 2010 11:28 PM > To: java-user@lucene.apache.org > Subject: RE: NumericField indexing performance > > Hi Tomislav, > > indexing with NumericField takes longer (at least for the default > precision step of 4, which means out of 32 bit integers make 8 subterms > with each 4 bits of the value). So you produce 8 times more terms > during indexing that must be handled by the indexer. If you have lots > of documents, with distinct values the term index gets larger and > larger, but search performance increases dramatically (for > NumericRangeQueries). So if you index *only* numeric fields and nothing > else, a 8 times slower indexing can be true. > > If you are not using NumericRangeQuery or you want tune indexing > performance, try larger precision Steps like 6 or 8. If you don’t use > NumericRangeQuery and only want to index the numeric terms as *one* > term, use precStep=Integer.MAX_VALUE. Also check your memory > requirements, as the indexer may need more memory and GC costs too > much. Also the index size will increase, so lots of more I/O is done. > Without more details I cannot say anything about your configuration. So > please tell us, how many documents, how many fields and how many > numeric fields in which configuration do you use? > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message----- > > From: Tomislav Poljak [mailto:tpol...@gmail.com] > > Sent: Wednesday, April 14, 2010 8:13 PM > > To: java-user@lucene.apache.org > > Subject: NumericField indexing performance > > > > Hi, > > is it normal for indexing time to increase up to 10 times after > > introducing NumericField instead of Field (for two fields)? > > > > I've changed two date fields from String representation (Field) to > > NumericField, now it is: > > > > doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600)) > > > > and after this change indexing took 10x more time (before it was few > > minutes and after more than an hour and half). I've tested with a > > simple > > counter like this: > > > > doc.add(new NumericField("endTime").setIntValue(count++)) > > > > but nothing changed, it still takes around 10x longer. If I comment > > adding one numeric field to index time drops significantly and if I > > comment both fields indexing takes only few minutes again. > > > > Tomislav > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org