[ 
https://issues.apache.org/jira/browse/SOLR-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195001#comment-14195001
 ] 

David Smiley commented on SOLR-4656:
------------------------------------

bq. It's a little different sense than maxAnayzedChars in that the unit of 
measurement is the number of MV entries rather than the number of characters 
analyzed, but I could argue either way.

Sure... but was there per-value overhead involved that was a bit heavy for the 
particular client you did this for (i.e. massive number of values) or was it 
just a matter of not accumulating value lengths?

bq. Although it sees kind of late to take away this parameter, should we 
deprecate it instead?

If there are a large number of values, I guess it has some value.

In my last comment to SOLR-6680 I stated I think multi-value handling should be 
done a bit differently in which each value should be virtually 
concatenated/iterated via a CharSequence wrapper and handed to the highlighter. 
 Likewise the TokenStreams of each value could be wrapped into a concatenating 
wrapper.  If that were done, then I think these parameters would be completely 
obsolete as it would handle the case of massive number of values.

I'll create a separate issue to accumulate maxAnalyzedChars per value and exit 
early.

> Add hl.maxMultiValuedToExamine to limit the number of multiValued entries 
> examined while highlighting
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4656
>                 URL: https://issues.apache.org/jira/browse/SOLR-4656
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>    Affects Versions: 4.3, Trunk
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Minor
>             Fix For: 4.3, Trunk
>
>         Attachments: SOLR-4656-4x.patch, SOLR-4656-4x.patch, 
> SOLR-4656-trunk.patch, SOLR-4656.patch
>
>
> I'm looking at an admittedly pathological case of many, many entries in a 
> multiValued field, and trying to implement a way to limit the number 
> examined, analogous to maxAnalyzedChars, see the patch.
> Along the way, I noticed that we do what looks like unnecessary copying of 
> the fields to be examined. We call Document.getFields, which copies all of 
> the fields and values to the returned array. Then we copy all of those to 
> another array, converting them to Strings. Then we actually examine them. a> 
> this doesn't seem very efficient and b> reduces the benefit from limiting the 
> number of mv values examined.
> So the attached does two things:
> 1> attempts to fix this
> 2> implements hl.maxMultiValuedToExamine
> I'd _really_ love it if someone who knows the highlighting code takes a peek 
> at the fix to see if I've messed things up, the changes are actually pretty 
> minimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to