Hi,

Consider the following use case:

schema = <pa.Schema instance>
cbuf = <pa.cuda.CudaBuffer instance>
cbatch = pa.cuda.read_record_batch(schema, cbuf)

Note that cbatch is pa.RecordBatch instance where data pointers are device
pointers.

for col in cbatch.columns:
    # here col is, say, FloatArray, that data pointer is a device pointer
    # as a result, accessing col data, say, taking a slice, leads to
segfaults
    print(col[0])

The aim of this message would be establishing a user-friendly way to
access, say, a slice of the device data so that only the requested data is
copied to host.

Or more generally, should there be a CUDA specific RecordBatch that
implements RecordBatch API that can be used from host?

For instance, this would be similar to DeviceNDArray in numba that
basically implements ndarray API for device data while the API can be used
from host.

What do you think? What would be the proper approach? (I can do the
implementation).

Best regards,
Pearu

Reply via email to