[ 
https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657345#comment-13657345
 ] 

David Smiley commented on LUCENE-4583:
--------------------------------------

I like the new test, Mike -- in particular it doesn't mandate a failure if the 
codec accepts > 32k.

I want to make sure it's clear what the logic is behind the decisions being 
made by Mike & Rob on this thread regarding the limits for binary doc values 
(not other things).  Firstly there is no intrinsic technical limitation that 
the Lucene42Consumer has on these values to perhaps 2GB (not sure but "big").  
Yet it is being decided to artificially neuter it to 32k.  I don't see anything 
in this thread establishing a particular use of binary DocValues that 
established it's _intended use_; I see it as general purpose as stored values, 
with different performance characteristics (clearly it's column-stride, for 
example).  The particular use I established earlier would totally suck if it 
had to use stored values.  And the reason for this limit... I'm struggling to 
find the arguments in this thread but appears to be that hypothetically in the 
future, there might evolve newer clever encodings that simply can't handle more 
than 32k.  If that's it then wouldn't such a new implementation simply have 
this different limit, and leave both as reasonable choices by the application?  
If that isn't it then what is the reason?
                
> StraightBytesDocValuesField fails if bytes > 32k
> ------------------------------------------------
>
>                 Key: LUCENE-4583
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4583
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0, 4.1, 5.0
>            Reporter: David Smiley
>            Priority: Critical
>             Fix For: 4.4
>
>         Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, 
> LUCENE-4583.patch, LUCENE-4583.patch
>
>
> I didn't observe any limitations on the size of a bytes based DocValues field 
> value in the docs.  It appears that the limit is 32k, although I didn't get 
> any friendly error telling me that was the limit.  32k is kind of small IMO; 
> I suspect this limit is unintended and as such is a bug.    The following 
> test fails:
> {code:java}
>   public void testBigDocValue() throws IOException {
>     Directory dir = newDirectory();
>     IndexWriter writer = new IndexWriter(dir, writerConfig(false));
>     Document doc = new Document();
>     BytesRef bytes = new BytesRef((4+4)*4097);//4096 works
>     bytes.length = bytes.bytes.length;//byte data doesn't matter
>     doc.add(new StraightBytesDocValuesField("dvField", bytes));
>     writer.addDocument(doc);
>     writer.commit();
>     writer.close();
>     DirectoryReader reader = DirectoryReader.open(dir);
>     DocValues docValues = MultiDocValues.getDocValues(reader, "dvField");
>     //FAILS IF BYTES IS BIG!
>     docValues.getSource().getBytes(0, bytes);
>     reader.close();
>     dir.close();
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to