The email chain concerns IPC, in particular that arrow-cpp's IPC reader can 
return unaligned buffers. Unless compiled with sanitizers, this will not 
visibly cause issues so long as the data never leaves arrow-cpp, however, when 
data is sent over FFI other arrow implementations, including arrow-rs and I 
believe arrow-java, they then complain.

We frequently get people running into this, to the point where our python 
bindings now just copy to align as there seemed to be little forward motion 
fixing the underlying issue in arrow-cpp. Making arrow-cpp's IPC reader, 
particularly arrow-flight, align by default would resolve the vast majority of 
sources of unaligned buffers.


On 28 March 2025 05:33:50 GMT, Weston Pace <weston.p...@gmail.com> wrote:
>First, the subject of this email is IPC, which is confusing.  From the
>discussion it sounds like we are primarily talking about FFI.  It sounds
>like the options here are:
>
>1. Silently realign unless user opts out of realignment
>
>The user will not get any runtime errors (easier to use) but a method which
>users would assume is zero-copy is silently making copies (hidden
>performance cost)
>
>2. Error on unaligned unless user opts in to realignment
>
>The user could potentially get runtime errors (harder to use) but the
>user's expectations will not be silently ignored.
>
>---
>
>Between the two options I would vote for #2.  An explicit goal of the C
>data interface (from the docs) is:
>
>> Allow zero-copy sharing of Arrow data between independent runtimes and
>components running in the same process.
>
>I would expect the default behavior of an FFI method to be zero-copy.  Let
>the user make the choice between accepting the realignment and chasing down
>why the data was unaligned in the first place and fixing it (e.g. by moving
>away from protobuf to some other transport or by filing a downstream bug on
>the go library to change how it is allocating buffers).  Personally I would
>just toggle on the "always realign" option 90% of the time, realignment
>copies are cheap, but I do think the user should be forced to make the
>choice.
>
>On Thu, Mar 27, 2025 at 10:21 AM Raphael Taylor-Davies
><r.taylordav...@googlemail.com.invalid> wrote:
>
>> If that is the eventual outcome of this discussion, I would be happy to,
>> however, I'd like to avoid fragmenting the discussion just yet.
>>
>> On 27/03/2025 17:18, Antoine Pitrou wrote:
>> >
>> > Le 27/03/2025 à 18:14, Raphael Taylor-Davies a écrit :
>> >>> It's obviously preferrable to be zero-copy but it's certainly not
>> >>> mandatory, especially as the data being shared is assumed to be
>> >>> read-only in most use cases.
>> >> In which case we should probably remove the comment about alignment from
>> >> the C interface specification, and highlight that implementations may
>> >> copy when needed. When I have suggested this in the past I have gotten
>> >> push back, implying at least some contingent beyond myself feels FFI
>> >> should be zero-copy.
>> >
>> > Can you submit a PR so that we can discuss concrete wording changes?
>> >
>> > Regards
>> >
>> > Antoine.
>>

Reply via email to