Re: [DISCUSSION] C-Data API for Non-CPU Use Cases

Weston Pace Mon, 10 Apr 2023 13:26:55 -0700

Sorry, I meant:

I am *now* a solid +1


On Mon, Apr 10, 2023 at 1:26 PM Weston Pace <weston.p...@gmail.com> wrote:

> I am not a solid +1 and I can see the usefulness.  Matt and I spoke on
> this externally and I think Matt has written a great summary.  There were a
> few more points that came up in the discussion that I think are
> particularly compelling.
>
> * Avoiding device location is generally fatal
>
> In other cases when we have generic metadata it is typically ok for
> libraries to ignore this and just make sure they forward it on properly.
> For example, when writing parquet files or IPC files we need to be sure we
> persist the generic metadata and load it from the disk.  However, if you
> ignore the device location, and it is not CPU, and you attempt to access
> the buffer (e.g. to persist the data) you will probably crash.
>
> This is not true for most of the examples I gave (one could argue that
> accessing a NUMA buffer from the wrong socket is a bad idea, but at least
> it is not fatal).  It's possible to contrive other cases that meet this
> criteria (e.g. a custom allocator that can temporarily persist buffers to
> disk and requires checkout before using the pointers or, arguably, the numa
> node example given above) but they can probably be mapped to the device id
> pattern and they aren't actively encountered today.
>
> * We need this internally
>
> As mentioned in Matt's previous email.  We could make use of this field
> today to avoid doing things (e.g. validating arrays) in flight / IPC when
> we know the data is not on the CPU.
>
> * Others are actively working around this problem today
>
> There are libraries today that have encountered this problem and have
> proposed similar workarounds.
>
>  * The changes to the stream interface are more than just "metadata"
>
> I did not look closely enough and realize that these changes are more
> substantial than just switching to ArrowDeviceArray.  These changes
> introduce a concept of queues to the stream which mirrors concepts found in
> mainstream GPU libraries (e.g. CUDA's "streams")
>
> On Mon, Apr 10, 2023 at 12:51 PM Matt Topol <zotthewiz...@gmail.com>
> wrote:
>
>> > There's nothing in the spec today that
>> prevents users from creating `ArrowDeviceArray` and
>> `ArrowDeviceArrayStream` themselves
>>
>> True, but third-party applications aren't going to be the only downstream
>> users of this API. We also want to build on this within Arrow itself to
>> enable easier usage of non-CPU memory in the higher-level interfaces (IPC,
>> Flight). Right now there are the buffer and memory manager classes, but
>> there are plenty of areas that still inspect the buffers instead of first
>> checking the `is_cpu` flag on them first. Plus we would want to be able to
>> expose the device_type and device_id information at those higher level
>> interfaces too.
>>
>> Even if we don't standardize on the device type list from dlpack, having a
>> standardized way for libraries to pass this device information alongside
>> the Arrow Arrays themselves is still beneficial and I think it's better
>> for
>> us to define it rather than wait for others to do so. The next step after
>> agreeing on this format change is going to be building helpers (similar to
>> the existing C Data helpers) around this to ensure safe usage and
>> conversion to C++ Arrow Array's and buffers, etc.
>>
>> > Would we rather come up with a mechanism for arbitrary key/value
>> metadata
>> (e.g. UTF-8 encoded JSON string) to accompany arrays?  Something similar
>> to
>> record batch metadata in the IPC format?
>>
>> A generic key/value metadata layer at the Array level for the C-Data API
>> could work in most cases *except* for the change to `get_next` for passing
>> the stream/queue pointer object that the producer needs to make the data
>> safe to access on. If a need for generic metadata at the array level IS
>> needed in the future, the reserved 128 bytes could be easily evolved into
>> a
>> const char* + a release callback or whatever else we need in order to
>> provide other generic metadata in the struct for the Array.
>>
>> My concerns with using a generic metadata layer to pass this information
>> are:
>> 1. There would be no strong type enforcement on the device_id/device_type
>> members, which would require additional error checking on consumers for
>> invalid values.
>> 2. We'd need to add a release callback or something to the
>> ArrowDeviceArray
>> struct itself in order to clarify who owns the `const char*` for the
>> metadata.
>>
>> > 1. Agreeing that dlpack has a good list of device ids
>>   * how is this list managed?  how hard is it for a device to get added to
>> the list?
>>   * how will updates to dlpack be mirrored into arrow?
>>   * instead of just repeating/vendoring the enum can we simply refer to it
>> and treat this as an opaque integer?)
>>
>> I'm not opposed to just referring to the dlpack enum and treating this as
>> an opaque integer if that would be preferable. I definitely agree with the
>> difficulties in vendoring/repeating the dlpack enum values here and
>> ensuring it stays up to date. Does anyone else have strong feelings one
>> way
>> or the other on this?
>>
>> --Matt
>>
>> On Mon, Apr 10, 2023 at 1:34 PM Weston Pace <weston.p...@gmail.com>
>> wrote:
>>
>> > I suppose I'm a weak +1 in "I don't entirely believe this will be useful
>> > but it doesn't seem harmful".  There's nothing in the spec today that
>> > prevents users from creating `ArrowDeviceArray` and
>> > `ArrowDeviceArrayStream` themselves and I'm not familiar enough with the
>> > systems producing / consuming these arrays to know if it makes sense for
>> > Arrow to standardize on a list.  E.g. picking dlpack's list of device
>> ids.
>> >
>> > Arrays and streams could also be labeled with (just brainstorming):
>> >  * numa node number
>> >  * worker id (e.g. in distributed execution)
>> >  * thread id (e.g. in some kind of work-stealing producer/consumer
>> > algorithm)
>> >  * source device MAC (e.g. in some kind of IoT environment)
>> >  * etc.
>> >
>> > Would we rather come up with a mechanism for arbitrary key/value
>> metadata
>> > (e.g. UTF-8 encoded JSON string) to accompany arrays?  Something
>> similar to
>> > record batch metadata in the IPC format?
>> >
>> > Otherwise, it seems what is proposed is:
>> >
>> > 1. Agreeing that dlpack has a good list of device ids
>> >   * how is this list managed?  how hard is it for a device to get added
>> to
>> > the list?
>> >   * how will updates to dlpack be mirrored into arrow?
>> >   * instead of just repeating/vendoring the enum can we simply refer to
>> it
>> > and treat this as an opaque integer?)
>> > 2. Providing an example of how you can tag arrays with metadata
>> >
>> >
>> >
>> > On Mon, Apr 10, 2023 at 9:30 AM Matt Topol <zotthewiz...@gmail.com>
>> wrote:
>> >
>> > > > The ArrowArray struct is not allowed to change, as it would break
>> the
>> > > ABI:
>> > >
>> > >
>> >
>> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
>> > >
>> > > I was referring more to the future case where we might need to
>> introduce
>> > an
>> > > `ArrowArrayV2` or something similar precisely because the ArrowArray
>> > struct
>> > > is not allowed to change to avoid breaking the ABI. The problem being
>> > that
>> > > it becomes difficult to communicate *per-buffer* device info if we are
>> > > locked into only the existing ArrowArray struct which doesn't provide
>> any
>> > > way to communicate that information. That said, you did bring up a
>> good
>> > > point in the PR of the question of ownership of that memory. So it
>> might
>> > > make more sense to just embed it and if we have to evolve the ABI we
>> can
>> > > use the reserved bytes.
>> > >
>> > > > How exactly would this be communicated? Is the information actually
>> > > useful? I got the impression that the CUDA programming model allows
>> you
>> > > to access exactly the right amount of data, regardless of HW
>> parallelism.
>> > >
>> > > From my research and discussions, it looks like there's still the
>> case of
>> > > handling vectorization in GPU programming. If you know the padding of
>> the
>> > > buffer you know whether or not it's safe to use bytes after the
>> length of
>> > > the data for the purposes of having equal length buffers that you can
>> > > perform vectorized operations on (similar to what you might do for CPU
>> > > programming). For now I've left this out and punted on it as it would
>> be
>> > > per-buffer information that we aren't communicating. But I wanted to
>> > > mention the concern in case anyone on the mailing list that is more
>> > > knowledgeable than I (or has particular use cases in mind) has an
>> opinion
>> > > on this and thinks we should add this info to be communicated somehow
>> > > (maybe another struct that we embed, or otherwise manage in the
>> > > ArrowDeviceArray struct).
>> > >
>> > > > The C++ Device class already exists for this. You can get a Buffer's
>> > > device pretty easily (by going through the MemoryManager, IIRC).
>> > >
>> > > I completely missed the Device class, good point! So when I get to
>> > > implementing helpers for the C++ lib for these format changes, I'd
>> likely
>> > > create an implementation of that Device class / MemoryManager class
>> for
>> > > handling devices described by the ArrowDeviceArray's information of
>> the
>> > > DeviceType + Device ID. Alongside a child class from Buffer for
>> managing
>> > > the buffers and making sure they return false for `is_cpu`.
>> > >
>> > > -Matt
>> > >
>> > > On Sat, Apr 8, 2023 at 4:41 AM Antoine Pitrou <anto...@python.org>
>> > wrote:
>> > >
>> > > >
>> > > > Hi Matt,
>> > > >
>> > > > I've posted comments on the PR. Besides:
>> > > >
>> > > > >     * The ArrowDeviceArray contains a pointer to an ArrowArray
>> > > alongside
>> > > > > the device information related to allocation. The reason for
>> using a
>> > > > > pointer is so that future modifications of the ArrowArray struct
>> do
>> > not
>> > > > > cause the size of this struct to change (as it would still just
>> be a
>> > > > > pointer to the ArrowArray struct).
>> > > >
>> > > > The ArrowArray struct is not allowed to change, as it would break
>> the
>> > > ABI:
>> > > >
>> > > >
>> > >
>> >
>> https://arrow.apache.org/docs/format/CDataInterface.html#updating-this-specification
>> > > >
>> > > > > Remaining Concerns that I can think of:
>> > > > >      * Alignment and padding of allocations can have a larger
>> impact
>> > > when
>> > > > > dealing with non-cpu devices than with CPUs, and this design
>> provides
>> > > no
>> > > > > way to communicate potential extra padding on a per-buffer basis.
>> We
>> > > > could
>> > > > > attempt to codify a convention that allocations should have a
>> > specific
>> > > > > alignment and a particular padding, but that doesn't actually
>> enforce
>> > > > > anything nor allow communicating if for some reason those
>> conventions
>> > > > > weren't followed. Should we add some way of passing this info or
>> punt
>> > > > this
>> > > > > for a future modification?
>> > > >
>> > > > How exactly would this be communicated? Is the information actually
>> > > > useful? I got the impression that the CUDA programming model allows
>> you
>> > > > to access exactly the right amount of data, regardless of HW
>> > parallelism.
>> > > >
>> > > > > This is part of a wider effort I'm attempting to address to
>> > > > > improve the non-cpu memory support in the Arrow libraries, such as
>> > > > enhanced
>> > > > > Buffer types in the C++ library that will have the device_id and
>> > > > > device_type information in addition to the `is_cpu` flag that
>> > currently
>> > > > > exists.
>> > > >
>> > > > The C++ Device class already exists for this. You can get a Buffer's
>> > > > device pretty easily (by going through the MemoryManager, IIRC).
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSSION] C-Data API for Non-CPU Use Cases

Reply via email to