[jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources

Jim Ferenczi (JIRA) Mon, 07 Jan 2019 16:44:16 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736523#comment-16736523
 ]


Jim Ferenczi commented on LUCENE-8630:
--------------------------------------

Setting a boost value on a leaf seems difficult since it will also depend on 
the length of each top-level interval. Moreover we don't have any evidence that 
the current scoring for intervals makes sense so I am reluctant to add another 
factor in the formula. I also think that we need to make the scoring more 
intuitive for intervals in general. The way we mix field statistics and 
proximity in the current scoring is misleading IMO, it implies that it's a good 
idea to mix interval query scores with boolean query scores even though scores 
are not comparable (we sum the IDF in the intervals). 
Maybe we should compute a score that only takes the interval lengths (1 / (1 + 
len)) into account and not the field statistics ? I don't think it's realistic 
to use an interval query to compute a score that mixes field statistics and 
proximity. We should try to decorelate these signals and add way to mix them 
correctly like the feature query does. It should be natural for instance to use 
a simple boolean query to select a subset of documents in a first pass and then 
use a rescorer with an interval query to re-rank based on proximity.


> Allow boosting of particular interval sources
> ---------------------------------------------
>
>                 Key: LUCENE-8630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8630
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; 
> for example, in lists of synonyms you may want the original term to be 
> weighted more, or more specific terms to receive higher weights than less 
> specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 
> 'PayloadScoreQuery' which allows direct modification of the score based on 
> stored payloads, but which does not deal well with a mix of terms 
> with-and-without payloads, and which ends up exposing a lot of the terms API, 
> making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a 
> float-valued 'boost()' method to IntervalIterator.  This would make it easy 
> to add simple boosts around particular terms in terms lists, and also allow 
> more fine-grained control using payloads without having to expose the 
> mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources

Reply via email to