I opened
https://issues.apache.org/jira/browse/ARROW-1785
https://issues.apache.org/jira/browse/ARROW-1786
I can take the liberty of removing the metadata per ARROW-1785 in the
next few days if there are no objections. We will want to add
documentation to indicate which buffers must accompany eac
However, this is currently broken in java refactor branch. I am fixing this
in https://issues.apache.org/jira/browse/ARROW-1779
On Thu, Nov 9, 2017 at 12:32 PM, Li Jin wrote:
> If null count is 0, the java library sets the validity vectors to all 1s.
>
> https://github.com/apache/arrow/blob/mast
If null count is 0, the java library sets the validity vectors to all 1s.
https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BitVector.java#L61
On Thu, Nov 9, 2017 at 12:23 PM, Wes McKinney wrote:
> Yep, see https://github.com/apache/arrow/blob/master/
Yep, see
https://github.com/apache/arrow/blob/master/format/Layout.md#null-bitmaps
"Arrays having a 0 null count may choose to not allocate the null bitmap."
I do not know what the Java library will do in the event of 0 null
count and 0-length validity bitmap -- in theory this should be
accounte
Ah! It didn't occur to me that a producer could just send a length-0
buffer since the reader implementations should ignore it anyway. I don't
mind the 16 byte cost of the metadata - I was referring to the bloat of
a 100% valid vector, which could be substantial.
Part of me wants to argue that
> So I'll go after the other validity vector - maybe producers should be
> allowed to omit the validity vector in the index? I just think if the goal is
> to reduce bloat then redundant validity vectors seems like a logical place to
> trim.
Well, the cost of the additional buffer metadata is on
Good point. Its a nice feature of the format that a dictionary batch and
a record batch with a single column look exactly the same when they
represent the same logical type.
So I'll go after the other validity vector - maybe producers should be
allowed to omit the validity vector in the index
The dictionary batches simply wrap a record batch with one “column”. There
should be no code difference (e.g. buffer layouts are the same) between the
code handling the data in a dictionary and a normal record batches. In
general, a dictionary may contain a null.
On Wed, Nov 8, 2017 at 4:05 PM Bri
Agreed, that sounds like a great solution to this problem - the layout
information is redundant and it doesn't make sense to include it in
every schema.
Although I would argue we should write down exactly what buffers are
supposed to go on the wire in the dictionary batches (i.e. value
vector
Per Jacques' comment in ARROW-1693
https://issues.apache.org/jira/browse/ARROW-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244812#comment-16244812,
I think we should remove the buffer layout from the metadata. It would
be a good idea to do this for
We've been having some integration issues with reading Dictionary
Vectors in the JS implementation - our current implementation can read
arrow files and streams generated by Java, but not by C++. Most of this
discussion is captured in ARROW-1693 [1].
It looks like ultimately the issue is that
11 matches
Mail list logo