Re: TopDocCollector & Paging

Grant Ingersoll Wed, 17 Sep 2008 17:06:40 -0700


On Sep 17, 2008, at 6:53 PM, Dino Korah wrote:

Thanks Grant.. Please see my comments/response below.

2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>
On Sep 17, 2008, at 4:39 PM, Dino Korah wrote:
I know in applications where we search for a words or phrases andexpect
the
result sorted by relevance, TopDocCollector would work like a dream.
But what about scenario where the result needs to be sorted
chronologically
or by some kind of metadata.
A very common application would be email applications. If someoneis tosearch on the Inbox, the result will be expected to appear sortedby date.
Wouldn't you expect by relevance and then by date? One way toachieve kindof what you want is a Function Query that uses the date as a factorin the
relevance score.
In my case relevance is really secondary. I may be wrong in sayingthis; In
case of emails, how can we sort by relevance and then by date?!
Also Function Query; I'd appreciate if you could point me to askeleton coefor Function Query. I did see the documentation, but this facilityseem to
be experimental.


It's been around for a while now.

If there are too many results, the user will most probably bewilling tolook through a fair part of the result list, which means pagingthrough the
generated hits/result is quite handy feature for a generic library.
Well, the way this is typically done is you ask for increasingly more
results and re-execute the query.  Another way is to cache.  In my
experience, it usually is very fast to requery, especially oncethings arein the OS cache, etc. I just don't see how you can say give meresults
100-100 if you don't know what results 1-99 are.
Here I am trying to get my head around in implementing TopDocs asHits aregetting deprecated. Currently I search and sort on the searcher andsave thereturned Hits object throughout the session or till the user runs anew
search. So when a user is requesting results from 100 to 110, i can do
hits.doc(100) ... hits.doc(110)

Well, Hits is silently doing the (repeated) fetching behind the scenesas you ask for more and more results. It is just a wrapper around theTopDocs stuff, but adds some caching

You said scoring was expensive, which maybe is true. Have youactuallyseen an issue w/ performance? Are you doing really complexqueries? Or areyou searching on really common terms? In your original email thatyou have
a 100M+ index.  Is this all on one machine?
All on one machine, query is complex. I have done all I could tofinetune
bot hardware , jvm and my software.

I'd say you need distributed search. You are definitely at the limitsof one machine.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TopDocCollector & Paging

Reply via email to