We use Lucene at our library for indexing from different sources into the same logical index. The sources are very diverse and are prioritized differently at index-time with document boosts. However, different groups of users (or individual users for that matter) have different preferences for the relevancy of the sources, which clashes with index-time boosting. Query-time tweaking would be preferable.
My coworker Mikkel Kamstrup Erlandsen had this bright and slightly scary idea... Suppose we have sources A-Z. For each document from the sources, we add the term groupboost_<source>:dummy. All documents from source A has groupboost_A:dummy, all documents from source B has groupboost_B:dummy and so on. Now, whenever the user enters a query, we parse it the normal way and wrap it in a BooleanQuery where we add our groupterm_<source>:dummy as TermQueries with boosts specified by the user (or more realistically under the hood by the front-end for the user). Example: Let's say we have a user that love all things from source A and hates the ones from source C. The front-end knows this. The user enters the query "foo" which expands to "foo OR groupboost_A:dummy^10 OR groupboost_C:dummy^0.1" The result should be that there's a high probability that the first hits will come from source A, unless there are significantly better matches from other groups. Likewise hits from group C will probably be near the end of the list of hits. Presto! The user gets what he wants, practically no search-time penalty, simple. One obvious limitation is that we don't want too many groups aka sources for this, but in reality we're talking 10-30 groups, so I don't see that as a problem. So what's the scary part? I don't know, I just have a feeling that Here Be Dragons. It seems that it should work without messing with ranking, besides the specific boost of course, as all documents match exactly one "groupboost_<source>:dummy"-query, but I would like to hear the opinion of more seasoned Lucene users. Is it a sensible way to approach the problem? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]