Re: How to properly correlate relevance in a search across multiple collections

Erick Erickson Mon, 08 Sep 2014 09:32:02 -0700

I think the point got lost in the discussion. Raw scores are simply
_not_ comparable from different collections. They aren't even
comparable for different queries in the _same_ collection. They are
_only_ relevant for ranking in the same collection within a single
query.


And even then raw scores don't tell you much. A score of 2 isn't
"twice as good" as a score of 1, it's just "somewhat better".

So the bottom line is that you start resorting to some kind of clever
presentation of the different groups to the user; tabs for each
collection, round-robin inclusion or meta-analysis where you query the
_same_ docs that exist in different indexes and try to create some
satisfactory heuristic etc.  as atawfik suggested.

Best,
Erick

On Mon, Sep 8, 2014 at 8:59 AM, Baldwin, David <david_bald...@bmc.com> wrote:
> Would it be possible, or does anyone have any experience, in using the raw 
> score from each separate collection to order and then after a merge come up 
> with relevancy?
>
> -----Original Message-----
> From: atawfik [mailto:contact.txl...@gmail.com]
> Sent: Sunday, September 07, 2014 9:50 AM
> To: java-user@lucene.apache.org
> Subject: Re: How to properly correlate relevance in a search across multiple 
> collections
>
> Hi,
>
> if you have documents that might exist in multiple collections, then you can 
> use techniques from meta search. That is combining multiple search results 
> from different collections. In this case, you can retrieve the top 100 or
> 1000 documents from each collection and merge them. You then rank documents 
> by using some aggregation methods. It is known that using the sum of 
> relevance scores produces good results.
>
> If there are no shared documents between collections, you still can use the 
> same approach but using different aggregation methods. One method is round 
> robin. You start by selecting the first ranked document from each collection. 
> Then, taking the second ranked document and so on.
>
> If that does not fit your needs, probably you should search for "federated or 
> aggregated search techniques". These techniques are used by giant search 
> engines to combine results from their search engine parts (images,video and 
> web). You can find a lot of academic resources in these aspects.
>
> Regards
> Ameer
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-to-properly-correlate-relevance-in-a-search-across-multiple-collections-tp4157240p4157321.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: How to properly correlate relevance in a search across multiple collections

Reply via email to