[
https://issues.apache.org/jira/browse/SOLR-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley closed SOLR-7326.
------------------------------
Resolution: Duplicate
> Reduce hl.maxAnalyzedChars budget for multi-valued fields in the default
> highlighter
> ------------------------------------------------------------------------------------
>
> Key: SOLR-7326
> URL: https://issues.apache.org/jira/browse/SOLR-7326
> Project: Solr
> Issue Type: Improvement
> Components: highlighter
> Reporter: David Smiley
> Assignee: David Smiley
>
> in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to
> constrain how much text is analyzed before the highlighter stops, in the
> interests of performance. For a multi-valued field, it effectively treats
> each value anew, no matter how much text it was previously analyzed for other
> values for the same field for the current document. The PostingsHighlighter
> doesn't work this way -- hl.maxAnalyzedChars is effectively the total budget
> for a field for a document, no matter how many values there might be. It's
> not reset for each value. I think this makes more sense. When we loop over
> the values, we should subtract from hl.maxAnalyzedChars the length of the
> value just checked. The motivation here is consistency with
> PostingsHighlighter, and to allow for hl.maxAnalyzedChars to be pushed down
> to term vector uninversion, which wouldn't be possible for multi-valued
> fields based on the current way this parameter is used.
> Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor
> hl.maxAnalyzedChars as the FVH doesn't have a knob for that. It does have
> hl.phraseLimit which is a limit that could be used for a similar purpose,
> albeit applied differently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]