[
https://issues.apache.org/jira/browse/LUCENE-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15469582#comment-15469582
]
Trejkaz edited comment on LUCENE-3371 at 9/7/16 5:17 AM:
---------------------------------------------------------
To summarise some investigation I did towards using SpanNotQuery with the new
pre and post parameters, it turns out that this doesn't work, but I can't
immediately see why.
My rewrite:
{code}
@Override
public SpanQuery rewrite(IndexReader reader) throws IOException
{
int nearQueriesCount = nearQueries.size();
SpanQuery[] notNearClauses = new SpanQuery[nearQueriesCount];
int pre = inOrder ? slop : 0;
int post = slop;
for (int i = 0; i < nearQueriesCount; i++)
{
notNearClauses[i] = new SpanNotQuery(mainQuery, nearQueries.get(i),
pre, post);
}
return new SpanNotQuery(mainQuery, new SpanOrQuery(notNearClauses));
}
{code}
i.e., for each query, create a "not near" clause, and then subtract the "not
near" clauses from the main query clause to get the "near all" result.
This logic is apparently wrong, because this query:
{noformat}
mainQuery = SpanTerm("content", "a")
nearQueries = [
SpanTerm("content", "b"),
SpanTerm("content", "c")
]
slop = 2,
inOrder = false
{noformat}
Is expected to match this text:
{noformat}
a x b c x x x a
{noformat}
But instead, it does not match.
was (Author: trejkaz):
To summarise some investigation I did towards using SpanNotNearQuery, it turns
out that this doesn't work, but I can't immediately see why.
My rewrite:
{code}
@Override
public SpanQuery rewrite(IndexReader reader) throws IOException
{
int nearQueriesCount = nearQueries.size();
SpanQuery[] notNearClauses = new SpanQuery[nearQueriesCount];
int pre = inOrder ? slop : 0;
int post = slop;
for (int i = 0; i < nearQueriesCount; i++)
{
notNearClauses[i] = new SpanNotQuery(mainQuery, nearQueries.get(i),
pre, post);
}
return new SpanNotQuery(mainQuery, new SpanOrQuery(notNearClauses));
}
{code}
i.e., for each query, create a "not near" clause, and then subtract the "not
near" clauses from the main query clause to get the "near all" result.
This logic is apparently wrong, because this query:
{noformat}
mainQuery = SpanTerm("content", "a")
nearQueries = [
SpanTerm("content", "b"),
SpanTerm("content", "c")
]
slop = 2,
inOrder = false
{noformat}
Is expected to match this text:
{noformat}
a x b c x x x a
{noformat}
But instead, it does not match.
> Support for a "SpanAndQuery" / "SpanAllNearQuery"
> -------------------------------------------------
>
> Key: LUCENE-3371
> URL: https://issues.apache.org/jira/browse/LUCENE-3371
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/search
> Reporter: Trejkaz
>
> I would like to parse queries like this:
> {noformat}
> a WITHIN 5 WORDS OF (b AND c)
> {noformat}
> This would match cases where both a b span and a c span are within 5 of the
> same a span.
> The existing span query classes do not appear to be capable of doing this no
> matter how they are combined, although replacing the AND with "WITHIN 10 OF"
> (general rule is to double the first number) at least ensures that no hits
> are lost (it just returns too many.)
> I'm not sure how the class would work, but it might be like this:
> {code}
> Query q = new SpanAllNearQuery(a, new SpanQuery[] { b, c }, 5, false);
> {code}
> The difference from SpanNearQuery is that SpanNearQuery considers the entire
> collection of terms as a single set to be found near each other, whereas this
> query would consider each of the terms in the array relative to the first.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]