Re: arrow::gpu::SerializeRecordBatch - does the result contain schema information?

Wes McKinney Mon, 27 Aug 2018 09:24:26 -0700

hi Pearu,

For the moment I recommend using arrow::ipc::SerializeSchema to
serialize the schema to host memory and then copying that memory to
the device


https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/writer.h#L221

In some cases there isn't much benefit to putting the schema on the
device, and some applications may strictly deal with the schema in CPU
memory (i.e. using POSIX shared memory or Plasma to manage shared
schemas)

One API to copy memory to the device is

https://github.com/apache/arrow/blob/master/cpp/src/arrow/gpu/cuda_memory.h#L68

There's probably some APIs we can add to improve usability for this procedure.

- Wes

On Mon, Aug 27, 2018 at 12:05 PM, Pearu Peterson
<[email protected]> wrote:
> Hi,
>
> I have implemented a function that copies host data (through wrapping it
> into arrow::Array object) to the gpu device using
> arrow::gpu::SerializeRecordBatch:
>
> ...
> #define MY_COLUMN_SCHEMA(DTYPE) ::arrow::schema({arrow::field("data",
> DTYPE)})
>
> arrow::Status ToRecordBatch(const my_column* column,
> std::shared_ptr<arrow::RecordBatch>* out) {
>   // zero-copy
>   std::shared_ptr<arrow::Array> arr;
>   std::shared_ptr<arrow::DataType> dtype = GetDataType(column);
>   ToArray(column, &arr);
>   *out = arrow::RecordBatch::Make(MY_COLUMN_SCHEMA(dtype), column->size,
> {arr});
>   return arrow::Status::OK();
> }
>
> // Use it on host
> arrow::Status ToDevice(const my_column *column,
> std::shared_ptr<arrow::gpu::CudaBuffer> *buffer) {
>   constexpr int kGpuNumber = 0;
>   arrow::gpu::CudaDeviceManager* manager_;
>   std::shared_ptr<arrow::gpu::CudaContext> context_;
>   arrow::gpu::CudaDeviceManager::GetInstance(&manager_);
>   manager_->GetContext(kGpuNumber, &context_);
>   std::shared_ptr<arrow::RecordBatch> batch;
>   auto status = ToRecordBatch(column, &batch);
>   if (!status.ok()) return status;
>   return arrow::gpu::SerializeRecordBatch(*batch, context_.get(), buffer);
> }
>
> To implement the reverse of ToDevice, a schema is needed by
> arrow::gpu::.ReadRecordBatch.
>
> Is the schema is included in CudaBuffer object?
> If yes, what would be the easiest way to get it?
> If not, what is the recommended strategy of passing schema+data to gpu
> device, and back?
>
> Best regards,
> Pearu

Re: arrow::gpu::SerializeRecordBatch - does the result contain schema information?

Reply via email to