[
https://issues.apache.org/jira/browse/LUCENE-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085610#comment-16085610
]
Jim Ferenczi commented on LUCENE-7848:
--------------------------------------
Dawid, sorry for the delay
{code}
spanNear([field:SPECIAL,
field:PROJECTS,
field:-,
spanOr([spanNear([SpanGap(:1),
field:xxx,SPECIAL], 0, true),
spanNear([SpanGap(:1),
field:xxx,
field:SPECIAL], 0, true)]),
field:PROJECTS,
field:-,
SpanGap(:1),
field:yyy], 0, true)
{code}
The problem is in those gaps inside {{spanOr}} -- the position increments get
screwed up somehow. I created the above query manually and this one works just
fine:
{code}
Query q = SpanNearQuery.newOrderedNearQuery(field)
.addClause(new SpanTermQuery(new Term(field, "SPECIAL")))
.addClause(new SpanTermQuery(new Term(field, "PROJECTS")))
.addClause(new SpanTermQuery(new Term(field, "-")))
.addGap(1)
.addClause(new SpanOrQuery(
SpanNearQuery.newOrderedNearQuery(field)
.addClause(new SpanTermQuery(new Term(field, "xxx,SPECIAL")))
.addGap(1)
.build(),
SpanNearQuery.newOrderedNearQuery(field)
.addClause(new SpanTermQuery(new Term(field, "xxx")))
.addClause(new SpanTermQuery(new Term(field, "SPECIAL")))
.build()
))
.addClause(new SpanTermQuery(new Term(field, "PROJECTS")))
.addClause(new SpanTermQuery(new Term(field, "-")))
.addGap(1)
.addClause(new SpanTermQuery(new Term(field, "yyy")))
.build();
{code}
These two queries are valid and should return result. The first one represents
exactly the graph produced by the WordDelimiterGraphFilter and the second one
has an extra gap after "xxx,SPECIAL". This extra gap is not irrelevant, it's
the only way to match the indexed form of the document with the path containing
the term "xxx,SPECIAL". If you look at the indexed positions "xxx,SPECIAL" is
at position 4 and position 5 has the term "SPECIAL". This is the flattened
version of the graph but the query side builds the correct version and ignores
that the positions are messed up by the indexer. If you add a manual gap then
it allows "xxx,SPECIAL" to also ignore the next position (5, SPECIAL) and to
jump directly to (6, PROJECTS).
Though the other path containing the splitted terms "xxx" and "SPECIAL" should
match on both queries. I think this is the real problem and the fact that the
second query match is just due to the additional gap that you added.
I don't have time at the moment to look at why the SpanQuery does not match the
first query. It deserves a separate issue anyway so I think we should focus on
whether the query produced by the QueryBuilder is valid or not. If it is then
the patch can be merged and we can look at the other problem separately.
[~mgibney] can you open a new issue or add your comment and patch to
https://issues.apache.org/jira/browse/LUCENE-7398 ? We should focus on the
query building in this issue first.
> QueryBuilder.analyzeGraphPhrase does not handle gaps correctly
> --------------------------------------------------------------
>
> Key: LUCENE-7848
> URL: https://issues.apache.org/jira/browse/LUCENE-7848
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 6.5, 6.6
> Reporter: Jim Ferenczi
> Attachments: capture-3.png, LUCENE-7848-branching-spanOr.patch,
> LUCENE-7848.patch, LUCENE-7848.patch
>
>
> Position increments greater than 1 are ignored when the query builder creates
> a graph phrase query.
> Instead it should use SpanNearQuery.addGap for pos incr > 1.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]