OK, I scanned all the e-mails in this thread so I may be way off base, but has anyone yet asked the basic question of whether the granularity of the dates is really necessary <G>?
Raf and Roberto: It appears you're indexing your dates down to second resolution, which is why your number of unique terms is so high. Will it serve your use-case to only index down to day? or perhaps hour? That will reduce your number of terms substantially. There is also the possibility of breaking up your dates into two or more fields if you really require the granularity. You could probably run a quick test of this approach just to see how it would change your search times before investing too muchtime in the process.... But I'm entirely ignorant of the multireader nuances, so this may be completely irrelevant.... Best Erick On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler <u...@thetaphi.de> wrote: > In addition to merging each month into one index instead of all in one > index, you could also do some additional optimization when using the Range > filter: > Just combine only those indexes needed to fulfil the range spec during > search. So if somebody want to filter Jan 15 to Feb 15, only create a > MultiReader of the indexes for Jan and Feb, this would speed up the whole > search (also for terms), as the filter would simply remove all documents > from the wrong months. > > But the best would be to use TrieRange :) > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Michael McCandless [mailto:luc...@mikemccandless.com] > > Sent: Saturday, April 11, 2009 4:03 PM > > To: java-user@lucene.apache.org > > Subject: Re: RangeFilter performance problem using MultiReader > > > > Ahhh, OK, perhaps that explains the sizable perf difference you're > > seeing w/ optimized vs not. I'm curious to see the results of your > > "merge each month into 1 index" test... > > > > Mike > > > > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini > > <ro.franch...@gmail.com> wrote: > > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless > > > <luc...@mikemccandless.com> wrote: > > >> Hmm then I'm a bit baffled again. > > >> > > >> Because, each of your "by month" indexes presumably has a unique > > >> subset of terms for the "date_doc" field? Meaning, a given "by month" > > >> index will have all date_doc corresponding to that month, and a > > >> different "by month" index would presumably have no overlap in the > > >> terms for the date_doc field. > > > > > > Yes and no :) In this situation: > > > > > >>> 200901-->index1, index2 > > >>> 200902-->index3 > > >>> 200903-->index4,index5,index6 > > > > > > each month does not overlap with each other, but index1 and index2 > > > overlap, and so index4 with 5 and 6. So there's overlapping inside a > > > single month. > > > So I want to trie, next week, this one: > > >>> 200901-->index12 (merge of 1 and 2) > > >>> 200902-->index3 > > >>> 200903-->index456 (merge of 4,5,6) > > > > > > This way we avoid overlapping inside a single month. Maybe this can > > > help: stay tuned :) > > > R. > > > > > > > > > -- > > > Roberto Franchini > > > http://www.celi.it > > > http://www.blogmeter.it > > > http://www.memesphere.it > > > Tel +39-011-6600814 > > > jabber:ro.franch...@gmail.com > > > <jabber%3aro.franch...@gmail.com>skype:ro.franchini > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >