Re: Search Performance Problem 16 sec for 250K docs

M A Sun, 20 Aug 2006 06:49:01 -0700

The index is already built in date order i.e. the older documents appear
first in the index, what i am trying to achieve is however the latest
documents appearing first in the search results ..  without the sort .. i
think they appear by relevance .. well thats what it looked like ..


I am looking at the scoring as we speak,



On 8/20/06, Erick Erickson <[EMAIL PROTECTED]> wrote:

About luke... I don't know about command-line interfaces, but if you copy
your index to a different machine and use Luke there. I do this between
Linux and Windows boxes all the time. Or, if you can mount the remote
drive
so you can see it, you can just use Luke to browse to it and open it up.
You
may have some latency though.....

See below...

On 8/20/06, M A <[EMAIL PROTECTED]> wrote:
>
> Ok I get your point, this still however means the first search on the
new
> searcher will take a huge amount of time .. given that this is happening
> now
> ..

You can fire one or several canned queries at the searcher whenever you
open
a new one. That way the first time a *user* hits the box, the warm-up will
already have happened. Note that the same searcher can be used by multiple
threads...

i.e. new search -> new query -> get hits ->20+ secs ..  this happens every
5
> mins or so ..
>
> although subsequent searches may be quicker ..
>
> Am i to assume for a first search the amount of  time is ok -> .. seems
> like
> a long time to me ..?
>
> The other thing is the sorting is fixed .. it never changes .. it is
> always
> sorted by the same field ..

Assuming that you still have performance issues, you could think about
building your index in pre-sorted order an just avoiding the sorting all
together. The internal Lucene document IDs are then your sort order (a
newly
added doc hast an ID that is always greater than any existing doc ID). I
don't know details of your problem space, but this might be relatively
easy.... You won't want to return things in relevance order in that case.
In
fact, you probably don't want relevance in place at all since your sorting
doesn't change.... I think a ConstantScoreQuery  might work for you here.

But I wouldn't go there unless you have evidence that your sort is slowing
you down, which is easy enough to verify by just taking it out. Don't
bother
with any of this until you re-use your reader though....

i just built the entire index and it still takes ages .,..

The search took ages? Or building the index? If the former, then
rebuilding
the index is irrelevant, it's the first time you use a searcher that
counts.

On 8/20/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> >
> >
> > : This is because the index is updated every 5 mins or so, due to the
> > incoming
> > : feed of stories ..
> > :
> > : When you say iteration, i take it you mean, search request, well for
> > each
> > : search that is conducted I create a new one .. search reader that is
> ..
> >
> > yeah ... i ment iteration of your test.  don't do that.
> >
> > if the index is updated every 5 minutes, then open a new searcher
every
> 5
> > minutes -- and reuse it for theentire 5 minutes.  if it's updated
> > "sparadically throughout the day" then open a search, and keep using
it
> > untill the index is udated, then open a new one.
> >
> > reusing an indexsearcher as long as possible is one of biggest factors
> of
> > Lucene applications.
> >
> > :
> > :
> > :
> > : On 8/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
> > : >
> > : >
> > : > :     hits = searcher.search(query, new Sort("sid", true));
> > : >
> > : > you don't show where searcher is initialized, and you don't
clarify
> > how
> > : > you are timing your multiple iterations -- i'm going to guess that
> you
> > are
> > : > opening a new searcher every iteration right?
> > : >
> > : > sorting on a field requires pre-computing an array of information
> for
> > that
> > : > field -- this is both time and space expensive, and is cached per
> > : > IndexReader/IndexSearcher -- so if you reuse the same searcher and
> > time
> > : > multiple iterations you'll find that hte first iteration might be
> > somewhat
> > : > slow, but the rest should be very fast.
> > : >
> > : >
> > : >
> > : > -Hoss
> > : >
> > : >
> > : >
> ---------------------------------------------------------------------
> > : > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > : > For additional commands, e-mail: [EMAIL PROTECTED]
> > : >
> > : >
> > :
> >
> >
> >
> > -Hoss
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>
>

Re: Search Performance Problem 16 sec for 250K docs

Reply via email to