hi Pearu, Yes, I think it would be a good idea to develop some tools to make interacting with device memory using the existing data structures work seamlessly.
This is all closely related to https://issues.apache.org/jira/browse/ARROW-2447 I would say step 1 would be defining the device abstraction. Then we can add methods or properties to the data structures in pyarrow to show the location of the memory, whether CUDA or host RAM, etc. We could also have a memory-mapped device for memory maps to be able to communicate that data is on disk. We could then define virtual APIs for host-side data access to ensure that memory is copied to the host if needed (e.g. in the case of indexing into the values of an array) There are some small details around the handling of device in the case of hierarchical memory references. So if we say `buffer->GetDevice()` then even if it's a sliced buffer (which will be the case after using any IPC reader APIs), it needs to return the right device. This means that we probably need to define a SlicedBuffer type that delegates GetDevice() calls to the parent buffer Let me know if what I'm saying makes sense. Kou and Antoine probably have some thoughts about this also. - Wes On Fri, Sep 28, 2018 at 5:34 AM Pearu Peterson <pearu.peter...@quansight.com> wrote: > > Hi, > > Consider the following use case: > > schema = <pa.Schema instance> > cbuf = <pa.cuda.CudaBuffer instance> > cbatch = pa.cuda.read_record_batch(schema, cbuf) > > Note that cbatch is pa.RecordBatch instance where data pointers are device > pointers. > > for col in cbatch.columns: > # here col is, say, FloatArray, that data pointer is a device pointer > # as a result, accessing col data, say, taking a slice, leads to > segfaults > print(col[0]) > > The aim of this message would be establishing a user-friendly way to > access, say, a slice of the device data so that only the requested data is > copied to host. > > Or more generally, should there be a CUDA specific RecordBatch that > implements RecordBatch API that can be used from host? > > For instance, this would be similar to DeviceNDArray in numba that > basically implements ndarray API for device data while the API can be used > from host. > > What do you think? What would be the proper approach? (I can do the > implementation). > > Best regards, > Pearu