Hi All,

Further context on Felipe's comment can be found [1]. You will note arrow-rs can align on read, and in fact does do so by default in its IPC reader, and by extension its flight client, the objection was solely to always doing this over FFI, where it should at the very least be configurable.

As elaborated on [2], the core issue is the C API spec currently states

> Consumers MAY decide not to support unaligned memory.

This naturally implies that it is valid for implementations to choose to not handle unaligned memory. Unfortunately the reality is that arrow-cpp, in particular its flight implementation [3], often produce unaligned buffers that then get sent over FFI. This presents arrow-rs with a tricky choice, on the one hand we can just copy to align by default like we already do for IPC, accepting that, like flatbuffers, we can't rely on implementations to align buffers correctly, or we view this as a bug/limitation in whatever is producing the data in the first place.

The current ambiguity, however, makes it hard to set reasonable defaults, as it isn't clear if FFI should be zero-copy and therefore have alignment restrictions or not. IMO it makes the most sense to require at least natural alignment and push this to where the IO occurs, i.e. the IPC reader, as that way it is in many cases possible to avoid copying the data twice, and even if not enforced unaligned buffers have potential performance problems regardless.

That all being said, IMO it is a bug not a feature of the arrow-cpp flight client that it can produce unaligned buffers. I can understand the desire to provide zero-copy, but then it should look to do this in a way that preserves alignment. I accept this is complicated with the way the flight protocol is designed, but the design of flight in general is not amenable to zero-copy so perhaps this doesn't really matter.

Kind Regards,

Raphael Taylor-Davies

[1]: https://github.com/apache/arrow-rs/pull/7137
[2]: https://github.com/apache/arrow-adbc/issues/2526
[3]: https://github.com/apache/arrow/issues/32276

On 27 March 2025 16:00:04 GMT, Felipe Oliveira Carvalho <felipe...@gmail.com> wrote:

   Hi, All this complexity everywhere when arrow-rs could simply check
   the alignment when they ingest external buffers and re-allocate to
   ensure alignment. I'm in favor of producers of Arrow arrays like a
   Flight client ensuring alignment as early as possible (when buffers
   are allocated for arrays decoded from the payloads) but doing that
   BY DEFAULT (no option needed) instead of checking and reallocating
   after the fact -- Rust code could do that: (1) it costs the same and
   (2) would be robust against users that forget to set the option or
   turn it off. -- Felipe On Thu, Mar 27, 2025 at 6:43 AM Rusty Conover
   <ru...@conover.me.invalid> wrote:

       Hi, This seems like a sensible approach and an improvement to
developer/user experience. Rusty

Reply via email to