Hi All,
Further context on Felipe's comment can be found [1]. You will note
arrow-rs can align on read, and in fact does do so by default in its IPC
reader, and by extension its flight client, the objection was solely to
always doing this over FFI, where it should at the very least be
configurable.
As elaborated on [2], the core issue is the C API spec currently states
> Consumers MAY decide not to support unaligned memory.
This naturally implies that it is valid for implementations to choose to
not handle unaligned memory. Unfortunately the reality is that
arrow-cpp, in particular its flight implementation [3], often produce
unaligned buffers that then get sent over FFI. This presents arrow-rs
with a tricky choice, on the one hand we can just copy to align by
default like we already do for IPC, accepting that, like flatbuffers, we
can't rely on implementations to align buffers correctly, or we view
this as a bug/limitation in whatever is producing the data in the first
place.
The current ambiguity, however, makes it hard to set reasonable
defaults, as it isn't clear if FFI should be zero-copy and therefore
have alignment restrictions or not. IMO it makes the most sense to
require at least natural alignment and push this to where the IO occurs,
i.e. the IPC reader, as that way it is in many cases possible to avoid
copying the data twice, and even if not enforced unaligned buffers have
potential performance problems regardless.
That all being said, IMO it is a bug not a feature of the arrow-cpp
flight client that it can produce unaligned buffers. I can understand
the desire to provide zero-copy, but then it should look to do this in a
way that preserves alignment. I accept this is complicated with the way
the flight protocol is designed, but the design of flight in general is
not amenable to zero-copy so perhaps this doesn't really matter.
Kind Regards,
Raphael Taylor-Davies
[1]: https://github.com/apache/arrow-rs/pull/7137
[2]: https://github.com/apache/arrow-adbc/issues/2526
[3]: https://github.com/apache/arrow/issues/32276
On 27 March 2025 16:00:04 GMT, Felipe Oliveira Carvalho
<felipe...@gmail.com> wrote:
Hi, All this complexity everywhere when arrow-rs could simply check
the alignment when they ingest external buffers and re-allocate to
ensure alignment. I'm in favor of producers of Arrow arrays like a
Flight client ensuring alignment as early as possible (when buffers
are allocated for arrays decoded from the payloads) but doing that
BY DEFAULT (no option needed) instead of checking and reallocating
after the fact -- Rust code could do that: (1) it costs the same and
(2) would be robust against users that forget to set the option or
turn it off. -- Felipe On Thu, Mar 27, 2025 at 6:43 AM Rusty Conover
<ru...@conover.me.invalid> wrote:
Hi, This seems like a sensible approach and an improvement to
developer/user experience. Rusty