[
https://issues.apache.org/jira/browse/SOLR-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley updated SOLR-2304:
-------------------------------
Fix Version/s: (was: 4.7)
4.8
> MoreLikeThis: Apply field level boosts before query terms are selected
> ----------------------------------------------------------------------
>
> Key: SOLR-2304
> URL: https://issues.apache.org/jira/browse/SOLR-2304
> Project: Solr
> Issue Type: Improvement
> Components: MoreLikeThis
> Affects Versions: 1.4.2
> Reporter: Mike Mattozzi
> Priority: Minor
> Fix For: 4.8
>
> Attachments: SOLR-2304.patch
>
>
> MoreLikeThis provides the ability to set field level boosts to weight the
> importance of fields in selecting similar documents. Currently, in trunk,
> these field level boosts are applied after the query terms have been selected
> from the priority queue of interesting terms in MoreLIkeThis. This can give
> unexpected results when used in combination with mlt.maxqt to limit the
> number of query terms. For example, if you use fields fieldA and fieldB and
> boost them "fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms
> in fieldA have relatively higher tf-idf scores than fieldB, only 20 fieldA
> terms will be selected as the basis for the MoreLikeThis query... even if
> after boosting, there are terms in fieldB with a higher overall score.
> I encountered this while using document descriptive text and document tags
> (comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the
> tags higher, however the less common document text terms were always selected
> as the query terms while the more common tag terms were eliminated by the
> maxqt parameter before their scores were boosted.
> I believe the code was originally written as it was so that the bulk of the
> work could be done in the MoreLikeThisHandler without modifying the
> MoreLikeThis class in the lucene project. Now that the projects are merged, I
> think this modification makes sense. I will be attaching a simple patch to
> trunk.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]