PhraseQuery issues - differences with SpanNearQuery

Yannis Pavlidis Thu, 04 Sep 2008 09:23:41 -0700

Hi,

I am having an issue when using the PhraseQuery which is best illustrated with 
this example:


I have created 2 documents to emulate URLs. One with a URL of: 
"http://www.airballoon.com"; and title "air balloon" and the second one with URL
"http://www.balloonair.com"; and title: "balloon air".

Test1 (PhraseQuery)
======
Now when I use the phrase query with - title: "air balloon" ~2
I get back:

url: "http://www.airballoon.com"; - score: 1.0
url: "http://www.balloonair.com"; - score: 0.57

Test2 (PhraseQuery)
======
Now when I use the phrase query with - title: "balloon air" ~2
I get back:
url: "http://www.balloonair.com"; - score: 1.0
url: "http://www.airballoon.com"; - score: 0.57

Test3 (PhraseQuery)
======
Now when I use the phrase query with - title: "air balloon" ~2 title: "balloon 
air" ~2
I get back:
url: "http://www.airballoon.com"; - score: 1.0
url: "http://www.balloonair.com"; - score: 1.0

Test4 (SpanNearQuery)
=======
spanNear([title:air, title:balloon], 2, false)
I get back:
url: "http://www.airballoon.com"; - score: 1.0
url: "http://www.balloonair.com"; - score: 1.0

I would have expected that Test1, Test2 would actually return both URLs with 
score of 1.0 since I am setting the slop to 2. It seems though that lucene 
really favors and absolute exact match.

Is it safe to assume that for what I am looking for (basically score the docs 
the same regardless on when someone is searching for "air balloon" or "balloon 
air") it would be better to use the SpanNearQuery rather than the PhraseQuery?

Any input would be appreciated. 

Thanks in advance,

Yannis.

PhraseQuery issues - differences with SpanNearQuery

Reply via email to