Siiiggghhh. So that means I'll have to really look at TrieRange before I can appear competent<G>..
Thanks Erick On Sat, Apr 11, 2009 at 11:23 AM, Uwe Schindler <u...@thetaphi.de> wrote: > This is why I invented TrieRange: Full precision dates but less terms > during > filtering/searching. With TrieRange on the longs returned bay > Date.getTime() > you even have precision of milliseconds without any speed decrease (only > bigger index size). Or double values with full precision, everything is > possible :-) > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- > > From: Erick Erickson [mailto:erickerick...@gmail.com] > > Sent: Saturday, April 11, 2009 6:42 PM > > To: java-user@lucene.apache.org > > Subject: Re: RangeFilter performance problem using MultiReader > > > > OK, I scanned all the e-mails in this thread so I may be way off base, > but > > has anyone yet asked the basic question of whether the granularity of the > > dates is really necessary <G>? > > > > Raf and Roberto: > > > > It appears you're indexing your dates down to second resolution, which > > is why your number of unique terms is so high. Will it serve your > use-case > > to only index down to day? or perhaps hour? That will reduce your number > > of terms substantially. There is also the possibility of breaking up your > > dates into two or more fields if you really require the granularity. You > > could > > probably run a quick test of this approach just to see how it would > change > > your search times before investing too muchtime in the process.... > > > > But I'm entirely ignorant of the multireader nuances, so this may be > > completely > > irrelevant.... > > > > Best > > Erick > > > > > > On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler <u...@thetaphi.de> wrote: > > > > > In addition to merging each month into one index instead of all in one > > > index, you could also do some additional optimization when using the > > Range > > > filter: > > > Just combine only those indexes needed to fulfil the range spec during > > > search. So if somebody want to filter Jan 15 to Feb 15, only create a > > > MultiReader of the indexes for Jan and Feb, this would speed up the > > whole > > > search (also for terms), as the filter would simply remove all > documents > > > from the wrong months. > > > > > > But the best would be to use TrieRange :) > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > -----Original Message----- > > > > From: Michael McCandless [mailto:luc...@mikemccandless.com] > > > > Sent: Saturday, April 11, 2009 4:03 PM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: RangeFilter performance problem using MultiReader > > > > > > > > Ahhh, OK, perhaps that explains the sizable perf difference you're > > > > seeing w/ optimized vs not. I'm curious to see the results of your > > > > "merge each month into 1 index" test... > > > > > > > > Mike > > > > > > > > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini > > > > <ro.franch...@gmail.com> wrote: > > > > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless > > > > > <luc...@mikemccandless.com> wrote: > > > > >> Hmm then I'm a bit baffled again. > > > > >> > > > > >> Because, each of your "by month" indexes presumably has a unique > > > > >> subset of terms for the "date_doc" field? Meaning, a given "by > > month" > > > > >> index will have all date_doc corresponding to that month, and a > > > > >> different "by month" index would presumably have no overlap in the > > > > >> terms for the date_doc field. > > > > > > > > > > Yes and no :) In this situation: > > > > > > > > > >>> 200901-->index1, index2 > > > > >>> 200902-->index3 > > > > >>> 200903-->index4,index5,index6 > > > > > > > > > > each month does not overlap with each other, but index1 and index2 > > > > > overlap, and so index4 with 5 and 6. So there's overlapping inside > a > > > > > single month. > > > > > So I want to trie, next week, this one: > > > > >>> 200901-->index12 (merge of 1 and 2) > > > > >>> 200902-->index3 > > > > >>> 200903-->index456 (merge of 4,5,6) > > > > > > > > > > This way we avoid overlapping inside a single month. Maybe this can > > > > > help: stay tuned :) > > > > > R. > > > > > > > > > > > > > > > -- > > > > > Roberto Franchini > > > > > http://www.celi.it > > > > > http://www.blogmeter.it > > > > > http://www.memesphere.it > > > > > Tel +39-011-6600814 > > > > > jabber:ro.franch...@gmail.com <jabber%3aro.franch...@gmail.com> > > <jabber%3aro.franch...@gmail.com <jabber%253aro.franch...@gmail.com> > >skype:ro.franchini > > > > > > > > > > > -------------------------------------------------------------------- > > - > > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >