On Sep 17, 2008, at 6:53 PM, Dino Korah wrote:
Thanks Grant.. Please see my comments/response below.
2008/9/17 Grant Ingersoll <[EMAIL PROTECTED]>
On Sep 17, 2008, at 4:39 PM, Dino Korah wrote:
I know in applications where we search for a words or phrases and
expect
the
result sorted by relevance, TopDocCollector would work like a dream.
But what about scenario where the result needs to be sorted
chronologically
or by some kind of metadata.
A very common application would be email applications. If someone
is to
search on the Inbox, the result will be expected to appear sorted
by date.
Wouldn't you expect by relevance and then by date? One way to
achieve kind
of what you want is a Function Query that uses the date as a factor
in the
relevance score.
In my case relevance is really secondary. I may be wrong in saying
this; In
case of emails, how can we sort by relevance and then by date?!
Also Function Query; I'd appreciate if you could point me to a
skeleton coe
for Function Query. I did see the documentation, but this facility
seem to
be experimental.
It's been around for a while now.
If there are too many results, the user will most probably be
willing to
look through a fair part of the result list, which means paging
through the
generated hits/result is quite handy feature for a generic library.
Well, the way this is typically done is you ask for increasingly more
results and re-execute the query. Another way is to cache. In my
experience, it usually is very fast to requery, especially once
things are
in the OS cache, etc. I just don't see how you can say give me
results
100-100 if you don't know what results 1-99 are.
Here I am trying to get my head around in implementing TopDocs as
Hits are
getting deprecated. Currently I search and sort on the searcher and
save the
returned Hits object throughout the session or till the user runs a
new
search. So when a user is requesting results from 100 to 110, i can do
hits.doc(100) ... hits.doc(110)
Well, Hits is silently doing the (repeated) fetching behind the scenes
as you ask for more and more results. It is just a wrapper around the
TopDocs stuff, but adds some caching
You said scoring was expensive, which maybe is true. Have you
actually
seen an issue w/ performance? Are you doing really complex
queries? Or are
you searching on really common terms? In your original email that
you have
a 100M+ index. Is this all on one machine?
All on one machine, query is complex. I have done all I could to
finetune
bot hardware , jvm and my software.
I'd say you need distributed search. You are definitely at the limits
of one machine.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]