[ 
https://issues.apache.org/jira/browse/SOLR-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Toke Eskildsen updated SOLR-11240:
----------------------------------
    Attachment: SOLR-11240.patch

Patch for master. Running {{ant test}} reported failing unit-tests for Cdcr & 
Cloud, but those areas are pretty far from the patch and the tests also fails 
when running without the patch.

Note the addition of the slow {{testTriggerUnInvertLimit}} in 
{{TestDocTermOrds}}. It takes about 10-15 seconds to run on a modern machine 
with SSD. I find that to be problematic, but I don't know any way to very 
quickly build an index with high enough term-cardinality to reach the old limit.

Barring errors, the fix should be complete and potential back-porting to 7x (or 
6) seems trivial. I invite anyone to review the patch.

> Raise UnInvertedField internal limit
> ------------------------------------
>
>                 Key: SOLR-11240
>                 URL: https://issues.apache.org/jira/browse/SOLR-11240
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: faceting
>    Affects Versions: 5.5.4, 6.6
>            Reporter: Toke Eskildsen
>            Assignee: Toke Eskildsen
>            Priority: Minor
>              Labels: easyfix
>             Fix For: master (8.0)
>
>         Attachments: SOLR-11240.patch
>
>
> {{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24 
> bytes for byte-arrays holding term ordinals. For String faceting on 
> high-cardinality Text fields, this can trigger the exception with "Too many 
> values for UnInvertedField". A search for that phrase shows that the 
> exception is encountered in the wild.
> The limitation is due to the packing being a combination of values and 
> pointers: If the values (term ordinals) for a given document-ID can fit in an 
> integer, they are stored directly. If the value of the first 8 bits in the 
> integer is 1, it signals that the following 3 bytes (24 bits) is a pointer 
> into a byte-array, limiting the array-size to 16M (2^24).
> Solution: Due to the values being packed at vInts, bit 31 (the last bit) of 
> the integer will never be 1 if the integer contains values. This means that 
> this bit it can be used for signalling whether or not the preceding bits 
> should be parsed as values or a pointer. The effective pointer size is thus 
> 2^31, which matches the array-length limit in Java. Changing the signalling 
> mechanism does not affect space requirements and should not affect 
> performance.
> Note that this is only a 100-fold increase ever the 2^24 limit, not an 
> elimination: Performing uninverted Text field faceting on 100M documents with 
> 5K terms each will still raise an exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to