RE: How to properly correlate relevance in a search across multiple collections

2014-09-09 Thread Baldwin, David
g. Anyone? -Original Message- From: atawfik [mailto:contact.txl...@gmail.com] Sent: Tuesday, September 09, 2014 12:42 AM To: java-user@lucene.apache.org Subject: RE: How to properly correlate relevance in a search across multiple collections Hi David, It seems that MultiSearcher is deprecated i

RE: How to properly correlate relevance in a search across multiple collections

2014-09-09 Thread Vincent Sevel
Hi, Does someone know if the source of the jira issues search example is available: http://jirasearch.mikemccandless.com/ thanks, vince DISCLAIMER This message is intended only for use by the person to whom it is addressed. It may contain informa

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread atawfik
Hi David, It seems that MultiSearcher is deprecated in favor of MultiReader. Have a look here . Regarding the meta search approach, you can normalize raw scores of documents. There are many ways to do that. Just search for "normalization scor

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
, September 08, 2014 10:31 AM To: java-user Subject: Re: How to properly correlate relevance in a search across multiple collections I think the point got lost in the discussion. Raw scores are simply _not_ comparable from different collections. They aren't even comparable for different queri

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
java-user@lucene.apache.org Subject: Re: How to properly correlate relevance in a search across multiple collections An observation: df and IDF (document frequency) is a key driver of the whole relevancy framework on which stock Lucene is based. There is no question about its significant value. But... that mea

Re: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Erick Erickson
n using the raw > score from each separate collection to order and then after a merge come up > with relevancy? > > -Original Message- > From: atawfik [mailto:contact.txl...@gmail.com] > Sent: Sunday, September 07, 2014 9:50 AM > To: java-user@lucene.apache.org >

RE: How to properly correlate relevance in a search across multiple collections

2014-09-08 Thread Baldwin, David
-user@lucene.apache.org Subject: Re: How to properly correlate relevance in a search across multiple collections Hi, if you have documents that might exist in multiple collections, then you can use techniques from meta search. That is combining multiple search results from different collections

Re: How to properly correlate relevance in a search across multiple collections

2014-09-07 Thread atawfik
Hi, if you have documents that might exist in multiple collections, then you can use techniques from meta search. That is combining multiple search results from different collections. In this case, you can retrieve the top 100 or 1000 documents from each collection and merge them. You then rank do

Re: How to properly correlate relevance in a search across multiple collections

2014-09-06 Thread Jack Krupansky
An observation: df and IDF (document frequency) is a key driver of the whole relevancy framework on which stock Lucene is based. There is no question about its significant value. But... that means that we can't blindly "correlate" relevancy between "collections", in large part because the docum