[
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657491#comment-13657491
]
David Smiley commented on LUCENE-4583:
--------------------------------------
Aha; thanks for the clarification. I see it now. And I see that after I
commented the limit check, the assertion was hit. I didn't hit this assertion
with Barakat's patch when I last ran it; weird but whatever.
BTW ByteBlockPool doesn't really have this limit, notwithstanding the bug that
Barakat fixed in his patch. It's not a hard limit as BBP.append() and
readBytes() will conveniently loop for you whereas if code uses PagedBytes then
you could loop on fillSlice() yourself to support big values. That is a
bona-fide bug on ByteBlockPool that it didn't implement that loop correctly and
it should be fixed if not in this issue then another.
So a DocValues codec that supports large binary values could be nearly
identical to the current codec but call fillSlice() in a loop, and only for
variable-sized binary values (just like BBP's algorithm), and that would
basically be the only change. Do you support such a change? If not then why not
(a technical reason please)? If you can't support such a change, then would
you also object to the addition of a new codec that simply lifted this limit as
I proposed? Note that would include potentially a bunch of duplicated code
just to call fillSlice() in a loop; I propose it would be simpler and more
maintainable to not limit binary docvalues to 32k.
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
> Key: LUCENE-4583
> URL: https://issues.apache.org/jira/browse/LUCENE-4583
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1, 5.0
> Reporter: David Smiley
> Priority: Critical
> Fix For: 4.4
>
> Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch,
> LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field
> value in the docs. It appears that the limit is 32k, although I didn't get
> any friendly error telling me that was the limit. 32k is kind of small IMO;
> I suspect this limit is unintended and as such is a bug. The following
> test fails:
> {code:java}
> public void testBigDocValue() throws IOException {
> Directory dir = newDirectory();
> IndexWriter writer = new IndexWriter(dir, writerConfig(false));
> Document doc = new Document();
> BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
> bytes.length = bytes.bytes.length;//byte data doesn't matter
> doc.add(new StraightBytesDocValuesField("dvField", bytes));
> writer.addDocument(doc);
> writer.commit();
> writer.close();
> DirectoryReader reader = DirectoryReader.open(dir);
> DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
> //FAILS IF BYTES IS BIG!
> docValues.getSource().getBytes(0, bytes);
> reader.close();
> dir.close();
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]