I'll try to spend a little time soon refactoring to see how disruptive
the change would be, and also to help persuade others about the
benefits.

On Tue, Jul 9, 2019 at 9:57 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> This sounds fine to me, thus I'm +1 on removing this class.
>
> On Tue, Jul 9, 2019, at 2:11 PM, Wes McKinney wrote:
> > Yes, the schema would be the point of truth for the Field. The ChunkedArray
> > type would have to be validated against the schema types as with RecordBatch
> >
> > On Tue, Jul 9, 2019, 2:54 AM Uwe L. Korn <uw...@xhochy.com> wrote:
> >
> > > Hello Wes,
> > >
> > > where do you intend the Field object living then? Would this be part of
> > > the schema of the Table object?
> > >
> > > Uwe
> > >
> > > On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote:
> > > > hi folks,
> > > >
> > > > For some time now I have been uncertain about the utility provided by
> > > > the arrow::Column C++ class. Fundamentally, it is a container for two
> > > > things:
> > > >
> > > > * An arrow::Field object (name and data type)
> > > > * An arrow::ChunkedArray object for the data
> > > >
> > > > It was added to the C++ library in ARROW-23 in March 2016 as the basis
> > > > for the arrow::Table class which represents a collection of
> > > > ChunkedArray objects coming usually from multiple RecordBatches.
> > > > Sometimes a Table will have mostly columns with a single chunk while
> > > > some columns will have many chunks.
> > > >
> > > > I'm concerned about continuing to maintain the Column class as it's
> > > > spilling complexity into computational libraries and bindings alike.
> > > >
> > > > The Python Column class for example mostly forwards method calls to
> > > > the underlying ChunkedArray
> > > >
> > > >
> > > https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L355
> > > >
> > > > If the developer wants to construct a Table or insert a new "column",
> > > > Column objects must generally be constructed, leading to boilerplate
> > > > without clear benefit.
> > > >
> > > > Since we're discussing building a more significant higher-level
> > > > DataFrame interface per past mailing list discussions, my preference
> > > > would be to consider removing the Column class to make the user- and
> > > > developer-facing data structures simpler. I hate to propose breaking
> > > > API changes, so it may not be practical at this point, but I wanted to
> > > > at least bring up the issue to see if others have opinions after
> > > > working with the library for a few years.
> > > >
> > > > Thanks
> > > > Wes
> > > >
> > >
> >

Reply via email to