Different score for the same documents

kenji tsuruoka Mon, 02 Nov 2009 04:46:43 -0800

Dear. Lucene users.

Hi.
I have tried to index and search MEDLINE abstracts by LUCENE.


And there were some problems in the search results.

That is Lucene has assigned different ranks for the exactly samedocuments.

I didn't know the input documents for the index contain duplicatedocuments at the first time.I have solve the problem by making all input documents UNIQUE for theindex.


But I want to know how and why the situation was happened.

The duplicate document is as follows:

_pubmed_id=13029105:1952Nov15
_ArticleTitle_
<s n="1">Experimental diabetes and clinical diabetes.</s>
_pubmed_id_end_

There are TWO exactly same documents in "index".
And their rankings by Lucene are 3 and 18.

I have known texts in XML/HTML data should be extracted before indexing.
Anyway, I haven't done this work now.

Please let me know the reason why the same documents were showndifferent ranks.


Best,
K

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Different score for the same documents

Reply via email to