[
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655522#comment-13655522
]
Robert Muir commented on LUCENE-4583:
-------------------------------------
{quote}
Are you also against just fixing the limit in the core code
(IndexWriter/BinaryDocValuesWriter) and leaving the limit enforced in
the existing DVFormats (my patch)?
I thought that was a good compromise ...
This way at least users can still build their own / use DVFormats that
don't have the limit.
{quote}
I'm worried about a few things:
* I think the limit is ok, because in my eyes its the limit of a single term. I
feel that anyone arguing for increasing the limit only has abuse cases (not use
cases) in mind. I'm worried about making dv more complicated for no good
reason.
* I'm worried about opening up the possibility of bugs and index corruption
(e.g. clearly MULTIPLE people on this issue dont understand why you cannot just
remove IndexWriter's limit without causing corruption).
* I'm really worried about the precedent: once these abuse-case-fans have their
way and increase this limit, they will next argue that we should do the same
for SORTED, maybe SORTED_SET, maybe even inverted terms. They will make
arguments that its the same as binary, just with sorting, and why should
sorting bring in additional limits. I can easily see this all spinning out of
control.
* I think that most people hitting the limit are abusing docvalues as stored
fields, so the limit is providing a really useful thing today actually, and
telling them they are doing something wrong.
The only argument i have *for* removing the limit is that by expanding BINARY's
possible abuse cases (in my opinion, thats pretty much all its useful for), we
might prevent additional complexity from being added elsewhere to DV in the
long-term.
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
> Key: LUCENE-4583
> URL: https://issues.apache.org/jira/browse/LUCENE-4583
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/index
> Affects Versions: 4.0, 4.1, 5.0
> Reporter: David Smiley
> Priority: Critical
> Fix For: 4.4
>
> Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field
> value in the docs. It appears that the limit is 32k, although I didn't get
> any friendly error telling me that was the limit. 32k is kind of small IMO;
> I suspect this limit is unintended and as such is a bug. The following
> test fails:
> {code:java}
> public void testBigDocValue() throws IOException {
> Directory dir = newDirectory();
> IndexWriter writer = new IndexWriter(dir, writerConfig(false));
> Document doc = new Document();
> BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
> bytes.length = bytes.bytes.length;//byte data doesn't matter
> doc.add(new StraightBytesDocValuesField("dvField", bytes));
> writer.addDocument(doc);
> writer.commit();
> writer.close();
> DirectoryReader reader = DirectoryReader.open(dir);
> DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
> //FAILS IF BYTES IS BIG!
> docValues.getSource().getBytes(0, bytes);
> reader.close();
> dir.close();
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]