I'm also +1 on removing this class. François
On Tue, Jul 9, 2019 at 10:57 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > This sounds fine to me, thus I'm +1 on removing this class. > > On Tue, Jul 9, 2019, at 2:11 PM, Wes McKinney wrote: > > Yes, the schema would be the point of truth for the Field. The ChunkedArray > > type would have to be validated against the schema types as with RecordBatch > > > > On Tue, Jul 9, 2019, 2:54 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > > > > Hello Wes, > > > > > > where do you intend the Field object living then? Would this be part of > > > the schema of the Table object? > > > > > > Uwe > > > > > > On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote: > > > > hi folks, > > > > > > > > For some time now I have been uncertain about the utility provided by > > > > the arrow::Column C++ class. Fundamentally, it is a container for two > > > > things: > > > > > > > > * An arrow::Field object (name and data type) > > > > * An arrow::ChunkedArray object for the data > > > > > > > > It was added to the C++ library in ARROW-23 in March 2016 as the basis > > > > for the arrow::Table class which represents a collection of > > > > ChunkedArray objects coming usually from multiple RecordBatches. > > > > Sometimes a Table will have mostly columns with a single chunk while > > > > some columns will have many chunks. > > > > > > > > I'm concerned about continuing to maintain the Column class as it's > > > > spilling complexity into computational libraries and bindings alike. > > > > > > > > The Python Column class for example mostly forwards method calls to > > > > the underlying ChunkedArray > > > > > > > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355 > > > > > > > > If the developer wants to construct a Table or insert a new "column", > > > > Column objects must generally be constructed, leading to boilerplate > > > > without clear benefit. > > > > > > > > Since we're discussing building a more significant higher-level > > > > DataFrame interface per past mailing list discussions, my preference > > > > would be to consider removing the Column class to make the user- and > > > > developer-facing data structures simpler. I hate to propose breaking > > > > API changes, so it may not be practical at this point, but I wanted to > > > > at least bring up the issue to see if others have opinions after > > > > working with the library for a few years. > > > > > > > > Thanks > > > > Wes > > > > > > > > >