Le 01/08/2022 à 19:13, Wes McKinney a écrit :
If we start placing restrictions on how the out-of-line string buffers are managed and externalized, it risks undermining the zero-copy interoperability benefits that we're trying to achieve with this.
But embedded pointers in turn undermine zero-copy for IPC and Flight. And they probably make transferring data between CPU and GPU more difficult and more expensive (unless the embedded pointers happen to fall into a piece of the address space shared between CPU and GPU: which you cannot ensure if, say, you got those pointers from a third party through the C data interface).
So the bottom line seems to be that embedded pointers enable zero-copy for specific producers, but undermine existing zero-copy qualities for everyone (and, to speak more broadly, ease of data movement).
In addition, the embedded pointers deviate from Arrow's representation philosophy, adding cognitive load for implementors who now have to account for the fact that buffers do not tell "everything about the data" but may refer to memory unknown to them. The discussions about how to support this in Go are a direct consequence of this deviation in philosophy.
Overall, my opinion is that this is not a very good strategic choice for the project.
Regards Antoine.