Hi, Consider the following use case:
schema = <pa.Schema instance> cbuf = <pa.cuda.CudaBuffer instance> cbatch = pa.cuda.read_record_batch(schema, cbuf) Note that cbatch is pa.RecordBatch instance where data pointers are device pointers. for col in cbatch.columns: # here col is, say, FloatArray, that data pointer is a device pointer # as a result, accessing col data, say, taking a slice, leads to segfaults print(col[0]) The aim of this message would be establishing a user-friendly way to access, say, a slice of the device data so that only the requested data is copied to host. Or more generally, should there be a CUDA specific RecordBatch that implements RecordBatch API that can be used from host? For instance, this would be similar to DeviceNDArray in numba that basically implements ndarray API for device data while the API can be used from host. What do you think? What would be the proper approach? (I can do the implementation). Best regards, Pearu