RE: NumericRangeQuery performance with 1/2 billion documents in the index

Uwe Schindler Sat, 02 Jan 2010 13:53:16 -0800

I forgot:
- How did you measure query time?
- Did you warm your index reader?
- omit tf and norms is not needed for numeric fields, it is disabled by
default


-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Saturday, January 02, 2010 10:46 PM
> To: java-user@lucene.apache.org; kuma...@kumanan.com
> Subject: RE: NumericRangeQuery performance with 1/2 billion documents in
> the index
> 
> The information you gave us is a little spare.
> - What JVM do you use, what processor,...
> - How many documents match the query? NRQ is very fast, but if your range
> hits e.g. one third of all documents, the hit collection of 166 mill docs
> also takes lots of time. 7 seconds is normal for this case. Even with 50
> mio
> docs in the result range, collection would take in the seconds area for
> most
> cpus.
> - Why do you index and query with precision step 1? I would first try 6 or
> 4
> with long fields. With too low precSteps, queries get slower because you
> have a very, very large term index (64 terms per value!) and your query
> has
> to reposition the term index very often.
> - Why do you index NULL values as an integer (not long!) field with value
> 0?
> Those fiels are useless for your query and will never match any range on
> LONG values. So why not simply remove them? They also produce lots of
> terms
> with precStep=1 (32 terms).
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -----Original Message-----
> > From: Kumanan [mailto:kuma...@gmail.com]
> > Sent: Saturday, January 02, 2010 8:03 PM
> > To: java-user@lucene.apache.org
> > Subject: NumericRangeQuery performance with 1/2 billion documents in the
> > index
> >
> > Hi,
> >
> > We have an index with 500 million documents in the index. Index size is
> > 104
> > GB and 4 GB RAM for the search server.
> >
> > When we try to do NumericRangeQuery on document_date field, it takes
> > around
> > 7-10 seconds. Is this expected for this size index?
> >
> > Here is how I index that field.
> >
> >             documentDateTimeField = new NumericField(DOCUMENT_DATE_TIME,
> > 1,
> > Field.Store.NO, true);
> >             documentDateTimeField.setOmitNorms(true);
> >             documentDateTimeField.setOmitTermFreqAndPositions(true);
> >
> >             if(scoreDetails.getDocumentDate() != null) {
> >
> >
> >
> documentDateTimeField.setLongValue(scoreDetails.getDocumentDate().getTime(
> > ));
> >             } else {
> >                 documentDateTimeField.setIntValue(0);
> >             }
> >             doc.add(documentDateTimeField);
> >
> > Here is how I construct the range query.
> >
> >                     Long begin = esq.getBeginDate().getTime();
> >                     Long end = esq.getEndDate().getTime();
> >
> >                     NumericRangeQuery rangeQuery =
> >
> NumericRangeQuery.newLongRange(WordSentenceDocumentFields.DOCUMENT_DATE_TI
> > ME,
> >                             1, begin, end,
> >                             esq.isBeginDateInclusive(),
> > esq.isEndDateInclusive());
> >
> >                     BooleanQuery bq = new BooleanQuery();
> >                     bq.add(query, BooleanClause.Occur.MUST);
> >                     bq.add(rangeQuery, BooleanClause.Occur.MUST);
> >
> >                     query = bq;
> >
> > Am I doing something wrong?
> >
> > Thanks
> > Kumanan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: NumericRangeQuery performance with 1/2 billion documents in the index

Reply via email to