Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior

Thomas Rewig Tue, 19 Jul 2011 00:38:01 -0700

 Hi Uwe,

the docIds are these from Lucene. (and all Indexes are built with itsLucene versions)

So if I order e.g. with 'Sort.INDEXORDER' and I understand the sortingprinciple correctly, the following is correct because the aim溝脇しほみTerm is the first in index:


0    Score=12,2324     Doc.Id=8060    id=709579    name=aim溝脇しほみ
1    Score=12,2324     Doc.Id=227606    id=716893    name=aim

To avoid these problems right from the start, I need to use a differentanalyser for indexing? (So that the docs 'aim溝脇しほみ' and 'aim' havedifferent scores)


Thanks
Thomas








Am 18.07.2011 12:13, schrieb Uwe Schindler:

Hi Thomas,

Just one question: Are these docIds from Lucene or your own ones? And second, 
are the underlying indexes also built with the corresponding Lucene versions?

The reason behind: Nothing in Lucene guarantees the order of docIds for same 
scores, they can be arbitrary. One change in Lucene 3.3 is for example the use 
of TieredMergePolicy, that reorders documents during indexing for more 
efficient merging. So when you indexed also with Lucene 3.3 and the displayed 
document IDs are your own application specific ones (not the internal Lucene 
ones), the different order of search results can simply be caused by the fact, 
that the indexer in 3.3 can suddenly reorder the documents during merging 
(TieredMergePolicy). There is nothing wrong with that.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

-----Original Message-----
From: Thomas Rewig [mailto:tre...@mufin.com]
Sent: Monday, July 18, 2011 12:06 PM
To: java-user@lucene.apache.org
Subject: Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special
character behavior

   Hi Ian,

yes the score is identical but the inner ordering of same scores seems to be
different in the versions.

In Lucene 3.3.0 it seems that terms with special characters will be preferred
before the exact hit.

My code is:

      PhraseQuery query = new PhraseQuery();
          query.add(new Term("name", strQueryName));

          //topDocs = this.indexSeacher.search(query, 10);
          //topDocs = this.indexSeacher.search(query, 10, Sort.RELEVANCE);
              topDocs = this.indexSeacher.search(query, 10, Sort.INDEXORDER);

In all variants there are similar ordering problems even if they do not always
occur at the same query.
e.g. if I order by Sort.RELEVANCE the "queen" Doc problem doesn't occur but
there is a wrong ordering in the token aim (query name:aim)

0    Score=12,2324     Doc.Id=8060       id=709579     name=aim溝脇しほみ
1    Score=12,2324    Doc.Id=227606   id=716893      name=aim

Is there a way to guarantee the inner sorting of same scores? Or how can I
avoid that documente with special characters have the same score as
documente of exact matches?

Thanks in advance!
Thomas

Am 18.07.2011 10:08, schrieb Ian Lea:

I'm not sure what you are getting at.  A search using 3.1.0 and 3.3.0
returns the same docs with identical scores, except that one gives
them in order A,B and the other in order B,A?  What search method are
you using?  Does it guarantee anything about the order of returning
docs with identical scores?


--
Ian.


On Fri, Jul 15, 2011 at 3:01 PM, Thomas Rewig<tre...@mufin.com>   wrote:

   Hello,

there is a index with a lot of docs, 2 of them are:

doc1:

     1.Field=id            ITSVopfOLB=ITS---f0-- Value= 192
     2.Field=name     ITSVopfOLB=ITS----0-- Value= queen

doc2:

     1.Field=id            ITSVopfOLB=ITS---f0-- Value= 701492
     2.Field=name     ITSVopfOLB=ITS----0-- Value= queen板野友美 (Here

are chinese

characters - hopefully you can see them)

if I search in the index - with a TermQuery there is a different
behavior between Lucene 3.1.0 and 3.3.0 :

Query:

     Term:field='name' text='queen'

Result Lucene 3.1.0:

     0    Score=13,2132    Doc.Id=176002    id=192 name=queen
     1    Score=13,2132    Doc.Id=523407    id=701492 name=queen板野友美

Result Lucene 3.3.0:

     0    Score=13,2132    Doc.Id=523407    id=701492 name=queen板野友美
     1    Score=13,2132    Doc.Id=176002    id=192 name=queen

The result from Lucene 3.1.0 is that, what I would expect if I do a
'exact matching' Term Query.
Each index was indexed with its associated LuceneVersion.
I tested it with luke and with my own Code - the result was always the

same.

Is it a new feature in Lucene 3.3.0 or a bug?

Thanks in advance!
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior

Reply via email to