[
https://issues.apache.org/jira/browse/SOLR-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338498#comment-14338498
]
Alexey Kozhemiakin commented on SOLR-7167:
------------------------------------------
So we called it edismaxplus:
1. This is a initial version which implements ANY operator logic and does not
brake other query parsers.
2. This is an implementation of the approach described in previous email. We
have choosen approach number 3:
Let’s still parse queries from left to right, but remove BooleanQueries when we
have ANY-operator and introduce DisjunctionMaxQueries in it’s place.
Query is parsed from left to right.
• NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
• AND will change the Occurs flag of the clause to it’s left to MUST
unless it has already been set to MUST_NOT
• AND sets the Occurs flag of the clause to it’s right to MUST
• If the default operator of the query parser has been set to “And”: OR
will change the Occurs flag of the clause to it’s left to SHOULD unless it has
already been set to MUST_NOT
• OR sets the Occurs flag of the clause to it’s right to SHOULD
• ANY will not change the Occurs flag of the clause to it’s left, but
it needs to remove the Boolean query and create a DisjunctionMaxQuery in it’s
place.
In the previous approach, things got quickly to complicated. The current
grammar does not in fact represent a Boolean logic. There is no Boolean logic
grammar tree. It is read more like a stream of tokens, left to right, and when
you have AND – you change the Occurs flag of the clause to it’s left to MUST
unless it has already been set to MUST_NOT. And sets the Occurs flag of the
clause to it’s right to MUST.
You can read more about it here
https://lucidworks.com/blog/why-not-and-or-and-not/
To make it all work, we would need to define that grammar in such a way, that
OR takes two operands, AND takes two operands, that there is a real tree
structure in it. Then we can introduce another operator – ANY :
<AnyOp> ::>
<Clause (<field>)> <ANY> <Clause(<field>)> (<ANY> <Clause(<field>)>)*
This might introduce quite a few surprises though. We would need to make sure,
that even though, parsing is different, the end result stays the same for
operators AND, OR. This can also take simply too much time to implement
correctly.
The current patch seems to solve all addressed issues like different values of
mm parameter, many query fields in edismax's qf, incorrect query syntax. We
also don't have to deal with different cord factors, as we will be extending
edismax query parser.
With this Jar we have addressed all the issues mentioned above. No other
parsers are broken etc. The behaviour of ANY operator is consistent with the
behaviour of AND and OR operators in existing parser; it is parsed from left to
right and has similar possessive behavior as AND operator - the left value is
captured and packed into DisjunctionMaxQuery like on the following example:
{!edismaxplus}disk ANY cd ANY dvd
becomes
(+(DisjunctionMaxQuery((((text:disk) | (text:cd)) | (text:dvd)))))/no_coord
Note that when there are multiple ANY operators in chain the lvalue with
DisjunctionMaxQuery will be treated as subquery (such behavior is desired for
compatibility with various subqueries that can occur as R or L value and the
scoring will work as designed because
max(max(a,b),c) = max(a,b,c)
3. To make future maintenance easier (eg. solr version upgrade) the parser
plugin would require some additional work. For now it is directly based on
existing edismax parser codebase with minimal modification to make it work with
our code - the result is that we have many functionalities extracted from
mainline and injected into plugin (the base edismax implementation, the whole
query parser, etc.). To improve it we need to create extension points (as there
are none) in existing edismax parser and pass them as a patch to community, the
whole implementation of ANY operator should be based solely on such extension
points with only stub parser plugin on top to distinguish between base edismax
and edismaxplus.
> ANY operator synax - score only top matching term
> -------------------------------------------------
>
> Key: SOLR-7167
> URL: https://issues.apache.org/jira/browse/SOLR-7167
> Project: Solr
> Issue Type: Improvement
> Components: query parsers
> Affects Versions: 5.0
> Reporter: Alexey Kozhemiakin
>
> When we query
> (<term A> OR <term B> OR <term C> OR <term D>)
> and in case a document contains 2 or more of these terms: only the highest
> scoring term should contribute to the final relevancy score; possibly lower
> scoring terms should be discarded from the scoring algorithm.
> Ideally I'd like an operator like ANY:
> (<term A> ANY <term B> ANY <term C> ANY <term D>)
> that has the purpose: return documents, sorted by the score of the highest
> scoring term.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]