hi Brian,

In the record batch IPC formats (stream and file), the buffers are
supposed to be padded at minimum to an 8 byte offset, so that all
buffers start on an 8-byte aligned offset.

We should revisit this aspect of the format documents -- ideally
buffers would be 64-byte padded so that code that uses AVX512 can be
used more frequently. I think it would be better in the specification
to say: 64-byte padding is preferred, but 8-byte alignment (of start
offsets) and padding in IPC is the minimum requirement. In the C++
library for example, we are rounding up all our allocations to a
multiple of 64 bytes.

It's possible there's a missing alignment in the Java writer, so if
you can find a reproducible case where the IPC payload has a
misaligned buffer start offset we should definitely fix that as soon
as possible.

- Wes

On Sun, Jul 16, 2017 at 9:05 AM, bhulette <bhule...@ccri.com> wrote:
> Emilio and I ran into some byte alignment issues last week. We're generating
> data in the streaming format with the java lib, but the javascript lib is
> failing to read it because some of the buffers don't appear to be aligned.
>
> Its not clear to us which and is implemented incorrectly - the spec
> (https://arrow.apache.org/docs/memory_layout.html) says buffers should be
> padded to 64 byte boundaries - does that extend to record batches in the IPC
> formats?
>
> The javascript implementation currently uses typed arrays to create views
> for each buffer, which need to be aligned. We're looking into using a
> DataView or a flatbuffers ByteBuffer to get around this issue for now, but
> I'm wondering if this is a bug in the java implementation.
>
> Brian

Reply via email to