[jira] [Commented] (SOLR-7167) ANY operator synax - score only top matching term

Alexey Kozhemiakin (JIRA) Thu, 26 Feb 2015 07:07:31 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338498#comment-14338498
 ]


Alexey Kozhemiakin commented on SOLR-7167:
------------------------------------------


So we called it edismaxplus:
 
1. This is a initial version which implements ANY operator logic and does not 
brake other query parsers.
2. This is an implementation of the approach described in previous email. We 
have choosen approach number 3:
 
Let’s still parse queries from left to right, but remove BooleanQueries when we 
have ANY-operator and introduce DisjunctionMaxQueries in it’s place.
Query is parsed from left to right.
•         NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
•         AND will change the Occurs flag of the clause to it’s left to MUST 
unless it has already been set to MUST_NOT
•         AND sets the Occurs flag of the clause to it’s right to MUST
•         If the default operator of the query parser has been set to “And”: OR 
will change the Occurs flag of the clause to it’s left to SHOULD unless it has 
already been set to MUST_NOT
•         OR sets the Occurs flag of the clause to it’s right to SHOULD
•         ANY will not change the Occurs flag of the clause to it’s left, but 
it needs to remove the Boolean query and create a DisjunctionMaxQuery in it’s 
place.
In the previous approach, things got quickly to complicated. The current 
grammar does not in fact represent a Boolean logic. There is no Boolean logic 
grammar tree. It is read more like a stream of tokens, left to right, and when 
you have AND – you change the Occurs flag of the clause to it’s left to MUST 
unless it has already been set to MUST_NOT. And sets the Occurs flag of the 
clause to it’s right to MUST.
You can read more about it here 
https://lucidworks.com/blog/why-not-and-or-and-not/
To make it all work, we would need to define that grammar in such a way, that 
OR takes two operands, AND takes two operands, that there is a real tree 
structure in it. Then we can introduce another operator – ANY :
<AnyOp> ::> 
<Clause (<field>)> <ANY> <Clause(<field>)> (<ANY> <Clause(<field>)>)*
This might introduce quite a few surprises though. We would need to make sure, 
that even though, parsing is different, the end result stays the same for 
operators AND, OR. This can also take simply too much time to implement 
correctly.
The current patch seems to solve all addressed issues like different values of 
mm parameter, many query fields in edismax's qf, incorrect query syntax. We 
also don't have to deal with different cord factors, as we will be extending 
edismax query parser.
 
With this Jar we have addressed all the issues mentioned above. No other 
parsers are broken etc. The behaviour of ANY operator is consistent with the 
behaviour of AND and OR operators in existing parser; it is parsed from left to 
right and has similar possessive behavior as AND operator - the left value is 
captured and packed into DisjunctionMaxQuery like on the following example:
 
{!edismaxplus}disk ANY cd ANY dvd
 
becomes
 
(+(DisjunctionMaxQuery((((text:disk) | (text:cd)) | (text:dvd)))))/no_coord
 
 
Note that when there are multiple ANY operators in chain the lvalue with 
DisjunctionMaxQuery will be treated as subquery (such behavior is desired for 
compatibility with various subqueries that can occur as R or L value and the 
scoring will work as designed because
 
max(max(a,b),c) = max(a,b,c)
 
3. To make future maintenance easier (eg. solr version upgrade) the parser 
plugin would require some additional work. For now it is directly based on 
existing edismax parser codebase with minimal modification to make it work with 
our code - the result is that we have many functionalities extracted from 
mainline and injected into plugin (the base edismax implementation, the whole 
query parser, etc.). To improve it we need to create extension points (as there 
are none) in existing edismax parser and pass them as a patch to community, the 
whole implementation of ANY operator should be based solely on such extension 
points with only stub parser plugin on top to distinguish between base edismax 
and edismaxplus.


> ANY operator synax - score only top matching term
> -------------------------------------------------
>
>                 Key: SOLR-7167
>                 URL: https://issues.apache.org/jira/browse/SOLR-7167
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>    Affects Versions: 5.0
>            Reporter: Alexey Kozhemiakin
>
> When we query
> (<term A> OR <term B> OR <term C> OR <term D>)
> and in case a document contains 2 or more of these terms: only the highest 
> scoring term should contribute to the final relevancy score; possibly lower 
> scoring  terms should be discarded from the scoring algorithm.
> Ideally I'd like an operator like ANY:
> (<term A> ANY <term B> ANY <term C> ANY <term D>)
> that has the purpose: return documents, sorted by the score of the highest 
> scoring term.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7167) ANY operator synax - score only top matching term

Reply via email to