[
https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195001#comment-14195001
]
David Smiley commented on SOLR-4656:
------------------------------------
bq. It's a little different sense than maxAnayzedChars in that the unit of
measurement is the number of MV entries rather than the number of characters
analyzed, but I could argue either way.
Sure... but was there per-value overhead involved that was a bit heavy for the
particular client you did this for (i.e. massive number of values) or was it
just a matter of not accumulating value lengths?
bq. Although it sees kind of late to take away this parameter, should we
deprecate it instead?
If there are a large number of values, I guess it has some value.
In my last comment to SOLR-6680 I stated I think multi-value handling should be
done a bit differently in which each value should be virtually
concatenated/iterated via a CharSequence wrapper and handed to the highlighter.
Likewise the TokenStreams of each value could be wrapped into a concatenating
wrapper. If that were done, then I think these parameters would be completely
obsolete as it would handle the case of massive number of values.
I'll create a separate issue to accumulate maxAnalyzedChars per value and exit
early.
> Add hl.maxMultiValuedToExamine to limit the number of multiValued entries
> examined while highlighting
> -----------------------------------------------------------------------------------------------------
>
> Key: SOLR-4656
> URL: https://issues.apache.org/jira/browse/SOLR-4656
> Project: Solr
> Issue Type: Improvement
> Components: highlighter
> Affects Versions: 4.3, Trunk
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Minor
> Fix For: 4.3, Trunk
>
> Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch,
> SOLR-4656-trunk.patch, SOLR-4656.patch
>
>
> I'm looking at an admittedly pathological case of many, many entries in a
> multiValued field, and trying to implement a way to limit the number
> examined, analogous to maxAnalyzedChars, see the patch.
> Along the way, I noticed that we do what looks like unnecessary copying of
> the fields to be examined. We call Document.getFields, which copies all of
> the fields and values to the returned array. Then we copy all of those to
> another array, converting them to Strings. Then we actually examine them. a>
> this doesn't seem very efficient and b> reduces the benefit from limiting the
> number of mv values examined.
> So the attached does two things:
> 1> attempts to fix this
> 2> implements hl.maxMultiValuedToExamine
> I'd _really_ love it if someone who knows the highlighting code takes a peek
> at the fix to see if I've messed things up, the changes are actually pretty
> minimal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]