The most scary part is that that you will have to score each and every
document that has a source, probably all of the documents in your
corpus. So if you have a very large number of documents it might be a
bit expensive. Also, appending this query for boost only means that
you will get hits on documents that has nothing to do with the user
query.
I think you are looking for CustomScoreQuery.
karl
26 nov 2008 kl. 16.54 skrev Toke Eskildsen:
We use Lucene at our library for indexing from different sources into
the same logical index. The sources are very diverse and are
prioritized
differently at index-time with document boosts. However, different
groups of users (or individual users for that matter) have different
preferences for the relevancy of the sources, which clashes with
index-time boosting. Query-time tweaking would be preferable.
My coworker Mikkel Kamstrup Erlandsen had this bright and slightly
scary
idea...
Suppose we have sources A-Z. For each document from the sources, we
add
the term groupboost_<source>:dummy. All documents from source A has
groupboost_A:dummy, all documents from source B has groupboost_B:dummy
and so on.
Now, whenever the user enters a query, we parse it the normal way and
wrap it in a BooleanQuery where we add our groupterm_<source>:dummy as
TermQueries with boosts specified by the user (or more realistically
under the hood by the front-end for the user).
Example: Let's say we have a user that love all things from source A
and
hates the ones from source C. The front-end knows this. The user
enters
the query "foo" which expands to
"foo OR groupboost_A:dummy^10 OR groupboost_C:dummy^0.1"
The result should be that there's a high probability that the first
hits
will come from source A, unless there are significantly better matches
from other groups. Likewise hits from group C will probably be near
the
end of the list of hits.
Presto! The user gets what he wants, practically no search-time
penalty,
simple. One obvious limitation is that we don't want too many groups
aka
sources for this, but in reality we're talking 10-30 groups, so I
don't
see that as a problem.
So what's the scary part? I don't know, I just have a feeling that
Here
Be Dragons. It seems that it should work without messing with ranking,
besides the specific boost of course, as all documents match exactly
one
"groupboost_<source>:dummy"-query, but I would like to hear the
opinion
of more seasoned Lucene users. Is it a sensible way to approach the
problem?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]