Hi Antoine, This sounds like a reasonable path forward -- thank you for working on this. I'm at a conference this week, but I will review the PR and give feedback as soon as I can, probably tomorrow (Friday).
My main couple of initial questions are: * If it's possible to avoid extra overhead during Buffer construction to set up these abstractions. I'll look more closely at the patch * The consequences of making data() uncallable on non-CPU buffers. Perhaps we should simply document the recommended code for invoking CUDA kernels and other CUDA-functions on GPU buffers (or non-CPU buffers generally). Should there be a data_unsafe() method or similar (sorry if this is already addressed in the PR)? - Wes On Tue, Jan 28, 2020 at 5:51 AM Antoine Pitrou <anto...@python.org> wrote: > > > Hello, > > I've submitted a PR which exposes a new C++ abstraction layer: > https://github.com/apache/arrow/pull/6295 > > The goal is to allow safe handling of buffers residing on different > devices (the CPU, a GPU...). The layer exposes two interfaces: > > * the Device interface exposes information a particular memory-holding > device > * the MemoryManager allows allocating, copying, reading or writing > memory located on a particular device > > The Buffer API is modified so that calling data() fails on non-CPU > buffers. A separate address() method returns the buffer address as an > integer, and is allowed on any buffer. > > The API provides convenience functions to view or copy a buffer from one > device to the other. For example, a on-GPU buffer can be copied to the > CPU, and in some situations a zero-copy CPU view can also be created > (depending on the GPU capabilities and how the GPU memory was allocated). > > An example use in the PR is IPC. On the write side, a new > SerializeRecordBatch overload takes a MemoryManager argument and is able > to serialize data either to any kind of memory (CPU, GPU). On the read > side, ReadRecordBatch now works on any kind of input buffer, and returns > record batches backed by either CPU or GPU memory. > > It introduces a slight complexity in the CUDA namespace, since there are > both `CudaContext` and `CudaMemoryManager` classes. We could solve this > by merging the two concepts (but doing so may break compatibility for > existing users of CUDA). > > Regards > > Antoine.