Yes, the schema would be the point of truth for the Field. The ChunkedArray type would have to be validated against the schema types as with RecordBatch
On Tue, Jul 9, 2019, 2:54 AM Uwe L. Korn <uw...@xhochy.com> wrote: > Hello Wes, > > where do you intend the Field object living then? Would this be part of > the schema of the Table object? > > Uwe > > On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote: > > hi folks, > > > > For some time now I have been uncertain about the utility provided by > > the arrow::Column C++ class. Fundamentally, it is a container for two > > things: > > > > * An arrow::Field object (name and data type) > > * An arrow::ChunkedArray object for the data > > > > It was added to the C++ library in ARROW-23 in March 2016 as the basis > > for the arrow::Table class which represents a collection of > > ChunkedArray objects coming usually from multiple RecordBatches. > > Sometimes a Table will have mostly columns with a single chunk while > > some columns will have many chunks. > > > > I'm concerned about continuing to maintain the Column class as it's > > spilling complexity into computational libraries and bindings alike. > > > > The Python Column class for example mostly forwards method calls to > > the underlying ChunkedArray > > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355 > > > > If the developer wants to construct a Table or insert a new "column", > > Column objects must generally be constructed, leading to boilerplate > > without clear benefit. > > > > Since we're discussing building a more significant higher-level > > DataFrame interface per past mailing list discussions, my preference > > would be to consider removing the Column class to make the user- and > > developer-facing data structures simpler. I hate to propose breaking > > API changes, so it may not be practical at this point, but I wanted to > > at least bring up the issue to see if others have opinions after > > working with the library for a few years. > > > > Thanks > > Wes > > >