Hi all,

Thank you, Antoine and everyone for the feedback. It's been very helpful.
The proposal has been updated to incorporate suggested changes and clarify
as needed.

Several people have expressed support for the idea of using a Java version
of ChunkedArrays as the internal representation. I'm wondering if a
complete implementation of ChunkedArray is needed to achieve the
performance benefits that you mention in this thread. In my reading of the
API, data streamed as RecordBatches are converted to ChunkedArrays in a
One-RecordBatch-to-One-ChunkedArray fashion.  This suggests that the
complexity of managing chunks of different shapes isn't strictly required.
Is that your understanding?.

I don't have a sense of the effort required to produce a Java version of
ChunkedArrays, so I want to understand what the baseline requirement is.

Thanks again.

Larry



On Wed, Aug 24, 2022 at 11:58 AM Antoine Pitrou <anto...@python.org> wrote:

>
> Hi,
>
> Can Java developers please take a look at Larry's proposal below?
>
>
> As for my 2 cents as a non-Java developer:
>
> That's a detailed and well-explained proposal, thank you.
> My only concern is that you're proposing to implement this first as a
> set of contiguous vectors.  The various communication protocols offered
> by the Arrow specifications (IPC, Flight, C Stream Interface...) are all
> based on the notion of a stream of batches.  Minimizing the number of
> copies made is one of the selling points of Arrow, so being able to
> consume such streaming data without materializing a concatenation sounds
> important.
>
> Regards
>
> Antoine.
>
>
> Le 18/08/2022 à 19:09, Larry White a écrit :
> > Hi all,
> >
> > I would like to propose a new Table data structure for Arrow Java that is
> > similar to the existing VectorSchemaRoot, but has:
> >
> > - more table functionality (e.g. row-oriented operations)
> > - a simpler and more general mutability API
> >
> > It lacks VectorSchemaRoot's buffer-like qualities, making it more like
> the
> > common understanding of a table. The hope is that it would compliment
> > VectorSchemaRoot, with one used for batch/pipeline work, and the other a
> > standard 'table'
> >
> > A Google Doc describing the proposal can be found here:
> >
> https://docs.google.com/document/d/1J77irZFWNnSID7vK71z26Nw_Pi99I9Hb9iryno8B03c/edit?usp=sharing
> >
> > All comments are welcome.
> >
> > Best,
> >
> > larry
> >
>

Reply via email to