> The ArrowArray struct is not allowed to change, as it would break the ABI:
https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification

I was referring more to the future case where we might need to introduce an
`ArrowArrayV2` or something similar precisely because the ArrowArray struct
is not allowed to change to avoid breaking the ABI. The problem being that
it becomes difficult to communicate *per-buffer* device info if we are
locked into only the existing ArrowArray struct which doesn't provide any
way to communicate that information. That said, you did bring up a good
point in the PR of the question of ownership of that memory. So it might
make more sense to just embed it and if we have to evolve the ABI we can
use the reserved bytes.

> How exactly would this be communicated? Is the information actually
useful? I got the impression that the CUDA programming model allows you
to access exactly the right amount of data, regardless of HW parallelism.

>From my research and discussions, it looks like there's still the case of
handling vectorization in GPU programming. If you know the padding of the
buffer you know whether or not it's safe to use bytes after the length of
the data for the purposes of having equal length buffers that you can
perform vectorized operations on (similar to what you might do for CPU
programming). For now I've left this out and punted on it as it would be
per-buffer information that we aren't communicating. But I wanted to
mention the concern in case anyone on the mailing list that is more
knowledgeable than I (or has particular use cases in mind) has an opinion
on this and thinks we should add this info to be communicated somehow
(maybe another struct that we embed, or otherwise manage in the
ArrowDeviceArray struct).

> The C++ Device class already exists for this. You can get a Buffer's
device pretty easily (by going through the MemoryManager, IIRC).

I completely missed the Device class, good point! So when I get to
implementing helpers for the C++ lib for these format changes, I'd likely
create an implementation of that Device class / MemoryManager class for
handling devices described by the ArrowDeviceArray's information of the
DeviceType + Device ID. Alongside a child class from Buffer for managing
the buffers and making sure they return false for `is_cpu`.

-Matt

On Sat, Apr 8, 2023 at 4:41 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi Matt,
>
> I've posted comments on the PR. Besides:
>
> >     * The ArrowDeviceArray contains a pointer to an ArrowArray alongside
> > the device information related to allocation. The reason for using a
> > pointer is so that future modifications of the ArrowArray struct do not
> > cause the size of this struct to change (as it would still just be a
> > pointer to the ArrowArray struct).
>
> The ArrowArray struct is not allowed to change, as it would break the ABI:
>
> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
>
> > Remaining Concerns that I can think of:
> >      * Alignment and padding of allocations can have a larger impact when
> > dealing with non-cpu devices than with CPUs, and this design provides no
> > way to communicate potential extra padding on a per-buffer basis. We
> could
> > attempt to codify a convention that allocations should have a specific
> > alignment and a particular padding, but that doesn't actually enforce
> > anything nor allow communicating if for some reason those conventions
> > weren't followed. Should we add some way of passing this info or punt
> this
> > for a future modification?
>
> How exactly would this be communicated? Is the information actually
> useful? I got the impression that the CUDA programming model allows you
> to access exactly the right amount of data, regardless of HW parallelism.
>
> > This is part of a wider effort I'm attempting to address to
> > improve the non-cpu memory support in the Arrow libraries, such as
> enhanced
> > Buffer types in the C++ library that will have the device_id and
> > device_type information in addition to the `is_cpu` flag that currently
> > exists.
>
> The C++ Device class already exists for this. You can get a Buffer's
> device pretty easily (by going through the MemoryManager, IIRC).
>
> Regards
>
> Antoine.
>

Reply via email to