Hi,
Can Java developers please take a look at Larry's proposal below?
As for my 2 cents as a non-Java developer:
That's a detailed and well-explained proposal, thank you.
My only concern is that you're proposing to implement this first as a
set of contiguous vectors. The various communication protocols offered
by the Arrow specifications (IPC, Flight, C Stream Interface...) are all
based on the notion of a stream of batches. Minimizing the number of
copies made is one of the selling points of Arrow, so being able to
consume such streaming data without materializing a concatenation sounds
important.
Regards
Antoine.
Le 18/08/2022 à 19:09, Larry White a écrit :
Hi all,
I would like to propose a new Table data structure for Arrow Java that is
similar to the existing VectorSchemaRoot, but has:
- more table functionality (e.g. row-oriented operations)
- a simpler and more general mutability API
It lacks VectorSchemaRoot's buffer-like qualities, making it more like the
common understanding of a table. The hope is that it would compliment
VectorSchemaRoot, with one used for batch/pipeline work, and the other a
standard 'table'
A Google Doc describing the proposal can be found here:
https://docs.google.com/document/d/1J77irZFWNnSID7vK71z26Nw_Pi99I9Hb9iryno8B03c/edit?usp=sharing
All comments are welcome.
Best,
larry