hello all -
i have a problem with a SpanNearQuery returning incorrect (false
positive) results.
I am creating the context of a field using tokens which have position
increment set to either 1 or 0. The position increment is set to 0 for
special tokens, in this case part-of-speech markers.
Thus, brackets set of position increments:
[The __pos_dt] [cow __pos_noun] [jumped __pos_verb] [over __pos_prep]
[the __pos_dt] [moon __pos_noun] [. __pos_.]
My Span Query looks like:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
{
new SpanTermQuery(new Term("content", "jumped")),
new SpanTermQuery(new Term("content", "__pos_verb"))
}, 0, false);
This correctly finds the span: [jumped __pos_verb]
However, if I query:
SpanQuery sq = new SpanNearQuery(new SpanQuery[]
{
new SpanTermQuery(new Term("content", "jumped")),
new SpanTermQuery(new Term("content", "__pos_noun"))
}, 0, false);
This incorrectly finds the span: [cow __pos_noun] [jumped __pos_verb]
This is wrong because there is a distance of 1 between the tokens, not 0.
I am using a recent version of Lucene from SVN.
I am thinking that the problem is related to the position increment
being set to 0 for the first token of the incorrect "match" -- thus
perhaps this is a bug in the SpanNearQuery?
Best,
Marc
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]