On Sep 17, 2008, at 6:53 PM, Dino Korah wrote:

Thanks Grant.. Please see my comments/response below.

2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>


On Sep 17, 2008, at 4:39 PM, Dino Korah wrote:

I know in applications where we search for a words or phrases and expect
the
result sorted by relevance, TopDocCollector would work like a dream.
But what about scenario where the result needs to be sorted
chronologically
or by some kind of metadata.

A very common application would be email applications. If someone is to search on the Inbox, the result will be expected to appear sorted by date.


Wouldn't you expect by relevance and then by date? One way to achieve kind of what you want is a Function Query that uses the date as a factor in the
relevance score.


In my case relevance is really secondary. I may be wrong in saying this; In
case of emails, how can we sort by relevance and then by date?!
Also Function Query; I'd appreciate if you could point me to a skeleton coe for Function Query. I did see the documentation, but this facility seem to
be experimental.

It's been around for a while now.





If there are too many results, the user will most probably be willing to look through a fair part of the result list, which means paging through the
generated hits/result is quite handy feature for a generic library.


Well, the way this is typically done is you ask for increasingly more
results and re-execute the query.  Another way is to cache.  In my
experience, it usually is very fast to requery, especially once things are in the OS cache, etc. I just don't see how you can say give me results
100-100 if you don't know what results 1-99 are.


Here I am trying to get my head around in implementing TopDocs as Hits are getting deprecated. Currently I search and sort on the searcher and save the returned Hits object throughout the session or till the user runs a new
search. So when a user is requesting results from 100 to 110, i can do
hits.doc(100) ... hits.doc(110)



Well, Hits is silently doing the (repeated) fetching behind the scenes as you ask for more and more results. It is just a wrapper around the TopDocs stuff, but adds some caching



You said scoring was expensive, which maybe is true. Have you actually seen an issue w/ performance? Are you doing really complex queries? Or are you searching on really common terms? In your original email that you have
a 100M+ index.  Is this all on one machine?


All on one machine, query is complex. I have done all I could to finetune
bot hardware , jvm and my software.


I'd say you need distributed search. You are definitely at the limits of one machine.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to