[
https://issues.apache.org/jira/browse/SOLR-11240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129240#comment-16129240
]
Dawid Weiss edited comment on SOLR-11240 at 8/16/17 6:43 PM:
-------------------------------------------------------------
Just looking around casually, not verifying in-depth.
{code}
+ * A single entry is thus either 0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx
holding 0-4 vInts or
+ * 0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx holding a 31-bit pointer.
{code}
Somewhere in the above bitmasks the highest bit should be set :)
{code}
+ // TODO: Why is indexedTermsArray not part of this?
/** Returns total bytes used. */
public long ramBytesUsed() {
{code}
I'd piggyback that in and correct it in this issue.
{code}
+ @Slow
+ public void testTriggerUnInvertLimit() throws IOException {
{code}
Make it Nightly instead of Slow if it's such a resource-hog?
was (Author: dweiss):
Just looking around casually, not verifying in-depth.
{code}
+ * A single entry is thus either 0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx
holding 0-4 vInts or
+ * 0b0xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx holding a 31-bit pointer.
{code}
Somewhere in the above bitmasks the highest bit should be set :)
{code}
+ // TODO: Why is indexedTermsArray not part of this?
/** Returns total bytes used. */
public long ramBytesUsed() {
{code}
I'd piggyback that in and correct it in this issue.
{code}
+ @@Slow
+ public void testTriggerUnInvertLimit() throws IOException {
{code}
> Raise UnInvertedField internal limit
> ------------------------------------
>
> Key: SOLR-11240
> URL: https://issues.apache.org/jira/browse/SOLR-11240
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: faceting
> Affects Versions: 5.5.4, 6.6
> Reporter: Toke Eskildsen
> Assignee: Toke Eskildsen
> Priority: Minor
> Labels: easyfix
> Fix For: master (8.0)
>
> Attachments: SOLR-11240.patch
>
>
> {{UnInvertedField}} has via {{DocTermOrds}} an internal limitation of 2^24
> bytes for byte-arrays holding term ordinals. For String faceting on
> high-cardinality Text fields, this can trigger the exception with "Too many
> values for UnInvertedField". A search for that phrase shows that the
> exception is encountered in the wild.
> The limitation is due to the packing being a combination of values and
> pointers: If the values (term ordinals) for a given document-ID can fit in an
> integer, they are stored directly. If the value of the first 8 bits in the
> integer is 1, it signals that the following 3 bytes (24 bits) is a pointer
> into a byte-array, limiting the array-size to 16M (2^24).
> Solution: Due to the values being packed at vInts, bit 31 (the last bit) of
> the integer will never be 1 if the integer contains values. This means that
> this bit it can be used for signalling whether or not the preceding bits
> should be parsed as values or a pointer. The effective pointer size is thus
> 2^31, which matches the array-length limit in Java. Changing the signalling
> mechanism does not affect space requirements and should not affect
> performance.
> Note that this is only a 100-fold increase ever the 2^24 limit, not an
> elimination: Performing uninverted Text field faceting on 100M documents with
> 5K terms each will still raise an exception.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]