Re: Query time document group boosting

Karl Wettin Wed, 26 Nov 2008 22:31:43 -0800

The most scary part is that that you will have to score each and everydocument that has a source, probably all of the documents in yourcorpus. So if you have a very large number of documents it might be abit expensive. Also, appending this query for boost only means thatyou will get hits on documents that has nothing to do with the userquery.


I think you are looking for CustomScoreQuery.



    karl

26 nov 2008 kl. 16.54 skrev Toke Eskildsen:

We use Lucene at our library for indexing from different sources into
the same logical index. The sources are very diverse and areprioritized
differently at index-time with document boosts. However, different
groups of users (or individual users for that matter) have different
preferences for the relevancy of the sources, which clashes with
index-time boosting. Query-time tweaking would be preferable.
My coworker Mikkel Kamstrup Erlandsen had this bright and slightlyscary
idea...
Suppose we have sources A-Z. For each document from the sources, weadd
the term groupboost_<source>:dummy. All documents from source A has
groupboost_A:dummy, all documents from source B has groupboost_B:dummy
and so on.

Now, whenever the user enters a query, we parse it the normal way and
wrap it in a BooleanQuery where we add our groupterm_<source>:dummy as
TermQueries with boosts specified by the user (or more realistically
under the hood by the front-end for the user).
Example: Let's say we have a user that love all things from source Aandhates the ones from source C. The front-end knows this. The userenters
the query "foo" which expands to
"foo OR groupboost_A:dummy^10 OR groupboost_C:dummy^0.1"
The result should be that there's a high probability that the firsthits
will come from source A, unless there are significantly better matches
from other groups. Likewise hits from group C will probably be nearthe
end of the list of hits.
Presto! The user gets what he wants, practically no search-timepenalty,simple. One obvious limitation is that we don't want too many groupsakasources for this, but in reality we're talking 10-30 groups, so Idon't
see that as a problem.
So what's the scary part? I don't know, I just have a feeling thatHere
Be Dragons. It seems that it should work without messing with ranking,
besides the specific boost of course, as all documents match exactlyone"groupboost_<source>:dummy"-query, but I would like to hear theopinion
of more seasoned Lucene users. Is it a sensible way to approach the
problem?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query time document group boosting

Reply via email to