[
https://issues.apache.org/jira/browse/LUCENE-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066199#comment-13066199
]
Andrzej Bialecki commented on LUCENE-3320:
-------------------------------------------
An interesting concept to consider under this topic is sentence-level proximity
scoring. This is based on the assumption that often a proximity of terms within
a single sentence is enough to treat this as a stronger-than-average
association of terms, so when sentence boundaries are known the term positions
can be reduced to just sentence numbers (i.e. postings from the same sentence
use the same position that is a sentence number).
This is a middle ground between the no-proximity data (omitPositions) and the
full-proximity data. There is some literature available on this that indicates
this approach is promising:
http://www.springerlink.com/content/t5355418276v7115 , it's also mentioned in
the papers on static index pruning.
> Explore Proximity Scoring
> --------------------------
>
> Key: LUCENE-3320
> URL: https://issues.apache.org/jira/browse/LUCENE-3320
> Project: Lucene - Java
> Issue Type: Sub-task
> Components: core/search
> Affects Versions: Positions Branch
> Reporter: Simon Willnauer
> Fix For: Positions Branch
>
>
> Positions will be first class citizens rather sooner than later. We should
> explore proximity scoring possibilities as well as collection / scoring
> algorithms like proposed on LUCENE-2878 (2 phase collection)
> This paper might provide some basis for actual scoring implementation:
> http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]