Re: TopDocCollector & Paging

Dino Korah Wed, 17 Sep 2008 15:53:43 -0700

Thanks Grant.. Please see my comments/response below.

2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>


>
> On Sep 17, 2008, at 4:39 PM, Dino Korah wrote:
>
>  I know in applications where we search for a words or phrases and expect
>> the
>> result sorted by relevance, TopDocCollector would work like a dream.
>> But what about scenario where the result needs to be sorted
>> chronologically
>> or by some kind of metadata.
>>
>> A very common application would be email applications. If someone is to
>> search on the Inbox, the result will be expected to appear sorted by date.
>>
>
> Wouldn't you expect by relevance and then by date?  One way to achieve kind
> of what you want is a Function Query that uses the date as a factor in the
> relevance score.
>

In my case relevance is really secondary. I may be wrong in saying this; In
case of emails, how can we sort by relevance and then by date?!
Also Function Query; I'd appreciate if you could point me to a skeleton coe
for Function Query. I did see the documentation, but this facility seem to
be experimental.


> If there are too many results, the user will most probably be willing to
> look through a fair part of the result list, which means paging through the
> generated hits/result is quite handy feature for a generic library.
>

Well, the way this is typically done is you ask for increasingly more
> results and re-execute the query.  Another way is to cache.  In my
> experience, it usually is very fast to requery, especially once things are
> in the OS cache, etc.  I just don't see how you can say give me results
> 100-100 if you don't know what results 1-99 are.
>

Here I am trying to get my head around in implementing TopDocs as Hits are
getting deprecated. Currently I search and sort on the searcher and save the
returned Hits object throughout the session or till the user runs a new
search. So when a user is requesting results from 100 to 110, i can do
hits.doc(100) ... hits.doc(110)


>
> You said scoring was expensive, which maybe is true.  Have you actually
> seen an issue w/ performance?  Are you doing really complex queries?  Or are
> you searching on really common terms?  In your original email that you have
> a 100M+ index.  Is this all on one machine?
>

All on one machine, query is complex. I have done all I could to finetune
bot hardware , jvm and my software.


>
>
>
>>
>> 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>
>>
>>
>>> On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote:
>>>
>>> And how about queries that need starting position, like hits between
>>>
>>>> 100 and 200?
>>>>
>>>>
>>>> could we pass something to the collector that will count between 0 to
>>>> 100 and then get the next 100 records?
>>>>
>>>>
>>> The collector uses a Priority Queue to store doc ids and scores as they
>>> are
>>> collected.  All the collector knows is the document id and the score and,
>>> presumably what it has seen so far, to some extent.  Ordering is not
>>> defined
>>> until all the candidate docs have been scored.
>>>
>>> If you expect to do a lot of paging on a given set of results, I could
>>> imagine using an approach whereby you don't bother to insert entries if
>>> you've already seen them and could maybe save on some queue operations,
>>> but
>>> not sure how well it would work.
>>>
>>> The other thing to do is just ask for slightly more than you think you
>>> will
>>> need in the first query, but it depends on your users.  Most users, in my
>>> experience, don't go beyond page 2 or 3 at most, so you could consider
>>> paying the cost to get the top 30 or 50 and caching that for your paging.
>>> If you have other application specific knowledge, you can then adjust as
>>> appropriate.
>>>
>>> -Grant
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>
>> --
>> d i n o k o r a h
>> Tel: +44 7956 66 52 83
>> ---------------------------
>> 51°21'50.5902"N 0°6'11.8116"W
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


-- 
d i n o k o r a h
Tel: +44 7956 66 52 83
---------------------------
51°21'50.5902"N 0°6'11.8116"W

Re: TopDocCollector & Paging

Reply via email to