[
https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736132#comment-16736132
]
Alan Woodward commented on LUCENE-8630:
---------------------------------------
Here is a patch implementing the idea.
* IntervalIterator has a boost() method added. Term sources just return 1; a
new Intervals.boost() resource returns a configurable boost; disjunctions
return the value of the current top interval; conjunctions return the product
of all their sub-interval boosts.
* The minimizing conjunctions (ordered, unordered and minimum-should-match)
need to cache their lead sub-interval's boost up-front, so that boosts can be
correctly calculated once the lead has moved on. If every sub-interval is just
returning {{1}} this ends up being quite wasteful. So I've added a
'hasBoost()' method to IntervalsSource, which will return {{false}} if its
boost is 1. The minimizing conjunctions can check this up-front, and wrap
their sub-iterators so that calling boost() is not costly.
* Conjunctions are currently hard-coded to produce boosts that are the product
of their sub-boosts, but we may want to make this configurable in future.
* The patch is a bit bigger than necessary because the change made a
FilterIntervalIterator worth having, and I refactored a few other iterators to
use it as well - I can back those changes out if it makes this easier to review.
* I haven't added payload handling yet, but the idea would be to add a new type
of TermIntervalIterator that also takes a {{ToFloatFunction<BytesRef>}} that
{{boost()}} delegates to, with the value of the current payload. Users can
decide on their own how to handle null values, etc.
> Allow boosting of particular interval sources
> ---------------------------------------------
>
> Key: LUCENE-8630
> URL: https://issues.apache.org/jira/browse/LUCENE-8630
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others;
> for example, in lists of synonyms you may want the original term to be
> weighted more, or more specific terms to receive higher weights than less
> specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a
> 'PayloadScoreQuery' which allows direct modification of the score based on
> stored payloads, but which does not deal well with a mix of terms
> with-and-without payloads, and which ends up exposing a lot of the terms API,
> making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a
> float-valued 'boost()' method to IntervalIterator. This would make it easy
> to add simple boosts around particular terms in terms lists, and also allow
> more fine-grained control using payloads without having to expose the
> mechanics of the PostingsEnum
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]