[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Robert Muir (JIRA) Sun, 12 May 2013 05:11:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655522#comment-13655522
 ]


Robert Muir commented on LUCENE-4583:
-------------------------------------

{quote}
Are you also against just fixing the limit in the core code
(IndexWriter/BinaryDocValuesWriter) and leaving the limit enforced in
the existing DVFormats (my patch)?

I thought that was a good compromise ...

This way at least users can still build their own / use DVFormats that
don't have the limit.
{quote}

I'm worried about a few things:
* I think the limit is ok, because in my eyes its the limit of a single term. I 
feel that anyone arguing for increasing the limit only has abuse cases (not use 
cases) in mind. I'm worried about making dv more complicated for no good 
reason. 
* I'm worried about opening up the possibility of bugs and index corruption 
(e.g. clearly MULTIPLE people on this issue dont understand why you cannot just 
remove IndexWriter's limit without causing corruption).
* I'm really worried about the precedent: once these abuse-case-fans have their 
way and increase this limit, they will next argue that we should do the same 
for SORTED, maybe SORTED_SET, maybe even inverted terms. They will make 
arguments that its the same as binary, just with sorting, and why should 
sorting bring in additional limits. I can easily see this all spinning out of 
control.
* I think that most people hitting the limit are abusing docvalues as stored 
fields, so the limit is providing a really useful thing today actually, and 
telling them they are doing something wrong.

The only argument i have *for* removing the limit is that by expanding BINARY's 
possible abuse cases (in my opinion, thats pretty much all its useful for), we 
might prevent additional complexity from being added elsewhere to DV in the 
long-term.
                
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Priority: Critical
>             Fix For: 4.4
>
>         Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.    The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4583) StraightBytesDocValuesField fails if bytes > 32k

Reply via email to