Re: TopDocCollector & Paging

Grant Ingersoll Wed, 17 Sep 2008 13:54:35 -0700


On Sep 17, 2008, at 4:39 PM, Dino Korah wrote:

I know in applications where we search for a words or phrases andexpect the
result sorted by relevance, TopDocCollector would work like a dream.
But what about scenario where the result needs to be sortedchronologically
or by some kind of metadata.
A very common application would be email applications. If someone istosearch on the Inbox, the result will be expected to appear sorted bydate.

Wouldn't you expect by relevance and then by date? One way to achievekind of what you want is a Function Query that uses the date as afactor in the relevance score.

If there are too many results, the user will most probably bewilling tolook through a fair part of the result list, which means pagingthrough the
generated hits/result is quite handy feature for a generic library.

Well, the way this is typically done is you ask for increasingly moreresults and re-execute the query. Another way is to cache. In myexperience, it usually is very fast to requery, especially once thingsare in the OS cache, etc. I just don't see how you can say give meresults 100-100 if you don't know what results 1-99 are.

You said scoring was expensive, which maybe is true. Have youactually seen an issue w/ performance? Are you doing really complexqueries? Or are you searching on really common terms? In youroriginal email that you have a 100M+ index. Is this all on one machine?

2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>
On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote:

And how about queries that need starting position, like hits between
100 and 200?
could we pass something to the collector that will count between 0to
100 and then get the next 100 records?
The collector uses a Priority Queue to store doc ids and scores asthey arecollected. All the collector knows is the document id and thescore and,presumably what it has seen so far, to some extent. Ordering isnot defined
until all the candidate docs have been scored.
If you expect to do a lot of paging on a given set of results, Icouldimagine using an approach whereby you don't bother to insertentries ifyou've already seen them and could maybe save on some queueoperations, but
not sure how well it would work.
The other thing to do is just ask for slightly more than you thinkyou willneed in the first query, but it depends on your users. Most users,in myexperience, don't go beyond page 2 or 3 at most, so you couldconsiderpaying the cost to get the top 30 or 50 and caching that for yourpaging.If you have other application specific knowledge, you can thenadjust as
appropriate.

-Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
d i n o k o r a h
Tel: +44 7956 66 52 83
---------------------------
51°21'50.5902"N 0°6'11.8116"W


--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: TopDocCollector & Paging

Reply via email to