Re: RangeFilter performance problem using MultiReader

Erick Erickson Sat, 11 Apr 2009 09:42:26 -0700

OK, I scanned all the e-mails in this thread so I may be way off base, but
has anyone yet asked the basic question of whether the granularity of the
dates is really necessary <G>?


Raf and Roberto:

It appears you're indexing your dates down to second resolution, which
is why your number of unique terms is so high. Will it serve your use-case
to only index down to day? or perhaps hour? That will reduce your number
of terms substantially. There is also the possibility of breaking up your
dates into two or more fields if you really require the granularity. You
could
probably run a quick test of this approach just to see how it would change
your search times before investing too muchtime in the process....

But I'm entirely ignorant of the multireader nuances, so this may be
completely
irrelevant....

Best
Erick


On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> In addition to merging each month into one index instead of all in one
> index, you could also do some additional optimization when using the Range
> filter:
> Just combine only those indexes needed to fulfil the range spec during
> search. So if somebody want to filter Jan 15 to Feb 15, only create a
> MultiReader of the indexes for Jan and Feb, this would speed up the whole
> search (also for terms), as the filter would simply remove all documents
> from the wrong months.
>
> But the best would be to use TrieRange :)
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:luc...@mikemccandless.com]
> > Sent: Saturday, April 11, 2009 4:03 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter performance problem using MultiReader
> >
> > Ahhh, OK, perhaps that explains the sizable perf difference you're
> > seeing w/ optimized vs not.  I'm curious to see the results of your
> > "merge each month into 1 index" test...
> >
> > Mike
> >
> > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
> > <ro.franch...@gmail.com> wrote:
> > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> > > <luc...@mikemccandless.com> wrote:
> > >> Hmm then I'm a bit baffled again.
> > >>
> > >> Because, each of your "by month" indexes presumably has a unique
> > >> subset of terms for the "date_doc" field?  Meaning, a given "by month"
> > >> index will have all date_doc corresponding to that month, and a
> > >> different "by month" index would presumably have no overlap in the
> > >> terms for the date_doc field.
> > >
> > > Yes and no :) In this situation:
> > >
> > >>> 200901-->index1, index2
> > >>> 200902-->index3
> > >>> 200903-->index4,index5,index6
> > >
> > > each month does not overlap with each other, but index1 and index2
> > > overlap, and so index4 with 5 and 6. So there's overlapping inside a
> > > single month.
> > > So I want to trie, next week, this one:
> > >>> 200901-->index12 (merge of 1 and 2)
> > >>> 200902-->index3
> > >>> 200903-->index456 (merge of 4,5,6)
> > >
> > > This way we avoid overlapping inside a single month. Maybe this can
> > > help: stay tuned :)
> > > R.
> > >
> > >
> > > --
> > > Roberto Franchini
> > > http://www.celi.it
> > > http://www.blogmeter.it
> > > http://www.memesphere.it
> > > Tel +39-011-6600814
> > > jabber:ro.franch...@gmail.com 
> > > <jabber%3aro.franch...@gmail.com>skype:ro.franchini
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: RangeFilter performance problem using MultiReader

Reply via email to