On Sun, Jan 27, 2019 at 1:08 PM Neville Dipale <nevilled...@gmail.com> wrote:
> Hi Antoine, > > I've given your response some thought. > > I'm thinking more looking at the computational aspect of Arrow. I agree > that for representing and sharing data, RecordBatches achieve the purpose. > > I came across ChunkedArray, Column and Table while I was trying to create a > dataframe library in Rust. The other languages already benefit from these 3 > already implemented, but for Rust I've had to try create them myself. > This is what led me to asking the question, because the various languages > that I've seen so far, seem to follow the same kind of standard re. both > the structure and methods to create/interact with chunked arrays, columns, > and tables. > > [1] Go Tables: > https://github.com/apache/arrow/blob/master/go/arrow/array/table.go there's also this WIP dataframe package being built on top of Arrow: - https://github.com/gonum/exp/pull/19 -s > [2] CPP Tables: > https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.cc > [3] JS Tables: https://github.com/apache/arrow/blob/master/js/src/table.ts > [4] Ruby: > > https://github.com/apache/arrow/blob/master/ruby/red-arrow/lib/arrow/table.rb > [5] Python, pyarrow.Table > > While going through the source, I didn't find anything for Java, and that's > swayed me to think that maybe Tables don't need standardising as each > implementation would likely implement them differently (or not implement > them). > > Regards > Neville > > On Fri, 25 Jan 2019 at 20:56, Antoine Pitrou <anto...@python.org> wrote: > > > > > Hello Neville, > > > > I don't know if Tables need standardizing. Record Batches are part of > > the spec (*), and they are the basic block for exchanging and sharing > > tabular data. Depending on your application, you might exchange a > > stream of Record Batches, or a fixed-length sequence thereof (in which > > case you have a "Table"). > > > > (*) see https://arrow.apache.org/docs/metadata.html > > > > (reading that spec though, it's not obvious to me why the Record Batch > > definition doesn't reference a Schema) > > > > Regards > > > > Antoine. > > > > > > Le 25/01/2019 à 19:48, Neville Dipale a écrit : > > > Hi Arrow developers, > > > > > > I've been looking at the various language impls, and although a Table > > isn't > > > currently part of the spec, it seems to be implemented in CPP, Python, > > Go, > > > JS (and perhaps other languages). > > > > > > Are there plans of standardising these and adding them to the spec? > > > > > > I'm asking because I'm working on a dataframe implementation for Rust ( > > > https://github.com/nevi-me/rust-dataframe), and I've started trying to > > > implement columns and tables with the intention to upstream them if I > get > > > them right. > > > > > > Regards > > > Neville > > > > > >