Hi chris,
I've really only had a chnce to skim this thread so far, but if i
understand correctly, the goal is to get documents back in a "blended"
order based on:
1) textual relevancy to the search input
2) recentness
3) a mapping of field values to arbitrary numeric weights which
need to
be specified at query time (ie: score collection:A better then
collection:C better then collectoin:Q etc...)
You have perfectly understood my question, thanks for trying to help!
In that case i think a "function query" is the way to go ... I
haven't
relaly had a chance to catch up on the way the Solr FunctionQuery
class
morphed when it was adopted into the Lucene core, but i believe all
the
relevent pieces are in the org.apache.lucene.search.function
package, and
it seems to have some good package level javadocs...
Thats what i discovered. The question is : Is the ValueSourceQuery
strong and fast enough to be
used confidently in a production environment? I looked at the source
code and it seem spretty straightforward,
so I would say yes, as long as i use the caches correctly. Can you
confirm?
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/org/apache/lucene/search/function/package-summary.html
You seemed to be on the right track asking about
ValueSourceQuery ... but
thta's only part of hte puzzle: for the "recentness" aspect a
ValueSourceQuery composed on a ReverseOrdFieldSource should take
care of
things ... but the arbitrary weighting by "collection" will really
require
you to provide your own ValueSource implementation -- most likely
you'll
want to leverage the FieldCache, but map your
"collectionIds" (whatever
they are) to the numeric values you want to use.
then you'll have all the pieces, the only thing left to do will be to
decide if you want to combine them with a regular BooleanQuery or
use a
CustomScoreQuery.
Yes, I will have to implement my own ValueSource, but it seems
it'really not complicated, looking at the existing
ValueSource implementations.
As for your comments about "random scoring" ... this is really,
Really,
REALLY hard to get "right" for a variety of reasons that i don't
really
want to go into right now ... my advice: don't attempt to commit to
"random" ordering. Instead commit to promoting N randomly selected
documents to the front of the results ... this is easy to do by
writting a
custom query (again ValueSourceQuery can probably help you) where you
pick N random numbers between 0 and maxDoc and score them really
high ...
then let the rest of the docs score as they normally would.
What's wrong with this idea :
Each day i generate an shuffle a vector of Maxdoc integers from 0 to
Maxdoc.
Then i use a valueSource query with a valueSource that uses this
vector to randomly score the documents.
Of course I have to somehow normalize those random scores so that
their "contribution factor" remains constant when MaxDocs increases.
Thanks for your advices !
Antoine
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]