Hi, Precision step=1 is not necessarily the fastest (see javadocs of Lucene, should be similar in Lucene.NET). Try the default, 4, first. In general, those range queries will always be slower than text-only queries, as there is much more work to do (more terms, more documents,...)
This question is more related to Lucene.NET so I would ask the question on their mailing list. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Avi Levy [mailto:l...@wesee.com] > Sent: Tuesday, April 09, 2013 4:51 PM > To: java-user@lucene.apache.org > Subject: How to improve retrieval time when searching for a date range > > Hello, > > I have a Lucene.NET index created with version 2.9.4.1. The size of the index > is about 25 Million entries (In the production environment I will have > 50 Million or more). The Index size is 5.75GB. The index is used for searching > by text. I need to add a new functionality that allows performing a query for > a specific date range in addition to the textual search (The query is for text > AND date range). The date range the user can select from is either last 7 days > or last 30 days. > > The implementation I tried was to add a new indexed only numeric field > representing a date. The date is indexed as integer in the format yyyyMMdd. > I am indexing this field with a precision step of 1 (to make the retrieval the > fastest). During retrieval I create a Boolean query that has the original > query > and I added a clause for with MUST for the date range. > > When I compare the results to regular textual queries I see much slower > results. I compared by running 10 queries for warm-up (I don't count the > results). Then another 90 queries where I count the results. > > I will appreciate suggestion and tips on how to the performance of searching > by dates can be improved. > > You can see below the statistics for the runs, and the code for creating the > fields and the query. > > Thanks, > Avi > > No changes (using index with no dates) > 08 18:17:01,213 [1] INFO: {(null)} - Min search time: 2 > 08 18:17:01,213 [1] INFO: {(null)} - Max search time: 88 > 08 18:17:01,213 [1] INFO: {(null)} - Average search time: 23.0674157303371 > 08 18:17:01,213 [1] INFO: {(null)} - Search time Variance : 20.5 > 08 18:17:01,213 [1] INFO: {(null)} - Number of results above 700ms: 0 > > Index With Date (not using dates in query) > 08 18:22:49,093 [1] INFO: {(null)} - Min search time: 3 > 08 18:22:49,093 [1] INFO: {(null)} - Max search time: 176 > 08 18:22:49,093 [1] INFO: {(null)} - Average search time: 50.9325842696629 > 08 18:22:49,093 [1] INFO: {(null)} - Search time Variance : 46.85 > 08 18:22:49,093 [1] INFO: {(null)} - Number of results above 700ms: 0 > > With Dates - Last 7 Days > 08 19:38:17,988 [1] INFO: {(null)} - Min search time: 33 > 08 19:38:17,988 [1] INFO: {(null)} - Max search time: 1668 > 08 19:38:17,988 [1] INFO: {(null)} - Average search time: 704.741573033708 > 08 19:38:17,988 [1] INFO: {(null)} - Search time Variance : 607.05 > 08 19:38:17,988 [1] INFO: {(null)} - Number of results above 700ms: 44 > > With Dates - Last 30 Days > 08 19:48:17,123 [1] INFO: {(null)} - Min search time: 105 > 08 19:48:17,123 [1] INFO: {(null)} - Max search time: 4808 > 08 19:48:17,123 [1] INFO: {(null)} - Average search time: 2846.75280898876 > 08 19:48:17,123 [1] INFO: {(null)} - Search time Variance : 1934.11 > 08 19:48:17,123 [1] INFO: {(null)} - Number of results above 700ms: 72 > > Here are the field's definitions: > > var idField = new Field( "ID", String.Empty, Field.Store.YES, > Field.Index.NOT_ANALYZED_NO_NORMS ); document.Add( idField ); var > id2Field = new Field( "ID2", String.Empty, Field.Store.YES, Field.Index.NO ); > document.Add( id2Field ); > > var txtField = new Field( "txtField", String.Empty, Field.Store.NO, > Field.Index. > ANALYZED ); document.Add( txtField ); > > var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO, > Field.Index. ANALYZED ); document.Add( txt2Field ); > > var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO, > Field.Index. ANALYZED ); document.Add( txt3Field ); > > > > // The new date field > > var dateField = new NumericField( "Date", 1, Field.Store.NO, true ); > document.Add(dateField); > > > > I set the values to the fields. For the new date field I set it like this: > > Int64 dateInt = <some date>; > > dateField.SetIntValue(dateInt); > > > > The query: > > var fields = new String[3]; > > Dictionary<String, Single> boosts = new Dictionary<String, Single>(); > > fields[0]="txtField"; > > boosts.Add( fields[0],<Value>); > > fields[1]="txt2Field"; > > boosts.Add( fields[1],<Value>); > > fields[2]="txt3Field"; > > boosts.Add( fields[2],<Value>); > > MultiFieldQueryParser parser = new MultiFieldQueryParser( > Version.LUCENE_29, fields, analyzer, boosts ); var boolQuery = new > BooleanQuery(); Query simpleParsedQuery = parser.Parse( queryText ); > boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST ); > DateTime beginDate = <Date 7 or 30 days ago). > Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 + > beginDate.Year > * 10000; > > DateTime now = DateTime.UtcNow; > > Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000; > > NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange( > "Date", beginDateInt, endDateInt, true, true ); > > boolQuery.Add( datesQuery, BooleanClause.Occur.MUST ); > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org