Thanks Grant.. Please see my comments/response below. 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>
> > On Sep 17, 2008, at 4:39 PM, Dino Korah wrote: > > I know in applications where we search for a words or phrases and expect >> the >> result sorted by relevance, TopDocCollector would work like a dream. >> But what about scenario where the result needs to be sorted >> chronologically >> or by some kind of metadata. >> >> A very common application would be email applications. If someone is to >> search on the Inbox, the result will be expected to appear sorted by date. >> > > Wouldn't you expect by relevance and then by date? One way to achieve kind > of what you want is a Function Query that uses the date as a factor in the > relevance score. > In my case relevance is really secondary. I may be wrong in saying this; In case of emails, how can we sort by relevance and then by date?! Also Function Query; I'd appreciate if you could point me to a skeleton coe for Function Query. I did see the documentation, but this facility seem to be experimental. > If there are too many results, the user will most probably be willing to > look through a fair part of the result list, which means paging through the > generated hits/result is quite handy feature for a generic library. > Well, the way this is typically done is you ask for increasingly more > results and re-execute the query. Another way is to cache. In my > experience, it usually is very fast to requery, especially once things are > in the OS cache, etc. I just don't see how you can say give me results > 100-100 if you don't know what results 1-99 are. > Here I am trying to get my head around in implementing TopDocs as Hits are getting deprecated. Currently I search and sort on the searcher and save the returned Hits object throughout the session or till the user runs a new search. So when a user is requesting results from 100 to 110, i can do hits.doc(100) ... hits.doc(110) > > You said scoring was expensive, which maybe is true. Have you actually > seen an issue w/ performance? Are you doing really complex queries? Or are > you searching on really common terms? In your original email that you have > a 100M+ index. Is this all on one machine? > All on one machine, query is complex. I have done all I could to finetune bot hardware , jvm and my software. > > > >> >> 2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]> >> >> >>> On Sep 17, 2008, at 11:51 AM, Cam Bazz wrote: >>> >>> And how about queries that need starting position, like hits between >>> >>>> 100 and 200? >>>> >>>> >>>> could we pass something to the collector that will count between 0 to >>>> 100 and then get the next 100 records? >>>> >>>> >>> The collector uses a Priority Queue to store doc ids and scores as they >>> are >>> collected. All the collector knows is the document id and the score and, >>> presumably what it has seen so far, to some extent. Ordering is not >>> defined >>> until all the candidate docs have been scored. >>> >>> If you expect to do a lot of paging on a given set of results, I could >>> imagine using an approach whereby you don't bother to insert entries if >>> you've already seen them and could maybe save on some queue operations, >>> but >>> not sure how well it would work. >>> >>> The other thing to do is just ask for slightly more than you think you >>> will >>> need in the first query, but it depends on your users. Most users, in my >>> experience, don't go beyond page 2 or 3 at most, so you could consider >>> paying the cost to get the top 30 or 50 and caching that for your paging. >>> If you have other application specific knowledge, you can then adjust as >>> appropriate. >>> >>> -Grant >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>> For additional commands, e-mail: [EMAIL PROTECTED] >>> >>> >>> >> >> -- >> d i n o k o r a h >> Tel: +44 7956 66 52 83 >> --------------------------- >> 51°21'50.5902"N 0°6'11.8116"W >> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- d i n o k o r a h Tel: +44 7956 66 52 83 --------------------------- 51°21'50.5902"N 0°6'11.8116"W