Re: Using CUDA enabled pyarrow

Wes McKinney Fri, 28 Sep 2018 02:49:23 -0700

hi Pearu,

Yes, I think it would be a good idea to develop some tools to make
interacting with device memory using the existing data structures work
seamlessly.

This is all closely related to

https://issues.apache.org/jira/browse/ARROW-2447

I would say step 1 would be defining the device abstraction. Then we
can add methods or properties to the data structures in pyarrow to
show the location of the memory, whether CUDA or host RAM, etc. We
could also have a memory-mapped device for memory maps to be able to
communicate that data is on disk. We could then define virtual APIs
for host-side data access to ensure that memory is copied to the host
if needed (e.g. in the case of indexing into the values of an array)

There are some small details around the handling of device in the case
of hierarchical memory references. So if we say `buffer->GetDevice()`
then even if it's a sliced buffer (which will be the case after using
any IPC reader APIs), it needs to return the right device. This means
that we probably need to define a SlicedBuffer type that delegates
GetDevice() calls to the parent buffer

Let me know if what I'm saying makes sense. Kou and Antoine probably
have some thoughts about this also.

- Wes
On Fri, Sep 28, 2018 at 5:34 AM Pearu Peterson
<pearu.peter...@quansight.com> wrote:
>
> Hi,
>
> Consider the following use case:
>
> schema = <pa.Schema instance>
> cbuf = <pa.cuda.CudaBuffer instance>
> cbatch = pa.cuda.read_record_batch(schema, cbuf)
>
> Note that cbatch is pa.RecordBatch instance where data pointers are device
> pointers.
>
> for col in cbatch.columns:
>     # here col is, say, FloatArray, that data pointer is a device pointer
>     # as a result, accessing col data, say, taking a slice, leads to
> segfaults
>     print(col[0])
>
> The aim of this message would be establishing a user-friendly way to
> access, say, a slice of the device data so that only the requested data is
> copied to host.
>
> Or more generally, should there be a CUDA specific RecordBatch that
> implements RecordBatch API that can be used from host?
>
> For instance, this would be similar to DeviceNDArray in numba that
> basically implements ndarray API for device data while the API can be used
> from host.
>
> What do you think? What would be the proper approach? (I can do the
> implementation).
>
> Best regards,
> Pearu

Re: Using CUDA enabled pyarrow

Reply via email to