I think the point got lost in the discussion. Raw scores are simply _not_ comparable from different collections. They aren't even comparable for different queries in the _same_ collection. They are _only_ relevant for ranking in the same collection within a single query.
And even then raw scores don't tell you much. A score of 2 isn't "twice as good" as a score of 1, it's just "somewhat better". So the bottom line is that you start resorting to some kind of clever presentation of the different groups to the user; tabs for each collection, round-robin inclusion or meta-analysis where you query the _same_ docs that exist in different indexes and try to create some satisfactory heuristic etc. as atawfik suggested. Best, Erick On Mon, Sep 8, 2014 at 8:59 AM, Baldwin, David <david_bald...@bmc.com> wrote: > Would it be possible, or does anyone have any experience, in using the raw > score from each separate collection to order and then after a merge come up > with relevancy? > > -----Original Message----- > From: atawfik [mailto:contact.txl...@gmail.com] > Sent: Sunday, September 07, 2014 9:50 AM > To: java-user@lucene.apache.org > Subject: Re: How to properly correlate relevance in a search across multiple > collections > > Hi, > > if you have documents that might exist in multiple collections, then you can > use techniques from meta search. That is combining multiple search results > from different collections. In this case, you can retrieve the top 100 or > 1000 documents from each collection and merge them. You then rank documents > by using some aggregation methods. It is known that using the sum of > relevance scores produces good results. > > If there are no shared documents between collections, you still can use the > same approach but using different aggregation methods. One method is round > robin. You start by selecting the first ranked document from each collection. > Then, taking the second ranked document and so on. > > If that does not fit your needs, probably you should search for "federated or > aggregated search techniques". These techniques are used by giant search > engines to combine results from their search engine parts (images,video and > web). You can find a lot of academic resources in these aspects. > > Regards > Ameer > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-properly-correlate-relevance-in-a-search-across-multiple-collections-tp4157240p4157321.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org