[ 
https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736132#comment-16736132
 ] 

Alan Woodward commented on LUCENE-8630:
---------------------------------------

Here is a patch implementing the idea.

* IntervalIterator has a boost() method added.  Term sources just return 1; a 
new Intervals.boost() resource returns a configurable boost; disjunctions 
return the value of the current top interval; conjunctions return the product 
of all their sub-interval boosts.
* The minimizing conjunctions (ordered, unordered and minimum-should-match) 
need to cache their lead sub-interval's boost up-front, so that boosts can be 
correctly calculated once the lead has moved on.  If every sub-interval is just 
returning {{1}} this ends up being quite wasteful.  So I've added a 
'hasBoost()' method to IntervalsSource, which will return {{false}} if its 
boost is 1.  The minimizing conjunctions can check this up-front, and wrap 
their sub-iterators so that calling boost() is not costly.
* Conjunctions are currently hard-coded to produce boosts that are the product 
of their sub-boosts, but we may want to make this configurable in future.
* The patch is a bit bigger than necessary because the change made a 
FilterIntervalIterator worth having, and I refactored a few other iterators to 
use it as well - I can back those changes out if it makes this easier to review.
* I haven't added payload handling yet, but the idea would be to add a new type 
of TermIntervalIterator that also takes a {{ToFloatFunction<BytesRef>}} that 
{{boost()}} delegates to, with the value of the current payload.  Users can 
decide on their own how to handle null values, etc.

> Allow boosting of particular interval sources
> ---------------------------------------------
>
>                 Key: LUCENE-8630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8630
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; 
> for example, in lists of synonyms you may want the original term to be 
> weighted more, or more specific terms to receive higher weights than less 
> specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 
> 'PayloadScoreQuery' which allows direct modification of the score based on 
> stored payloads, but which does not deal well with a mix of terms 
> with-and-without payloads, and which ends up exposing a lot of the terms API, 
> making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a 
> float-valued 'boost()' method to IntervalIterator.  This would make it easy 
> to add simple boosts around particular terms in terms lists, and also allow 
> more fine-grained control using payloads without having to expose the 
> mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to