With apologies for not reviewing this earlier, I've reviewed it now!

I am +0 in its current state just because of the title (the proposal
seems to be about abstract arrays and not necessarily the C data
interface except for the title). Happy to keep up with reviews to get
this merged soon!

Cheers,

-dewey

On Thu, Dec 5, 2024 at 11:03 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> I don't think a second implementation is strictly necessary because this
> is just defining a schema and some conventions around it. Though of
> course a second implementation is always better to have.
>
> Regards
>
> Antoine.
>
>
> Le 05/12/2024 à 17:47, Matt Topol a écrit :
> >> * I implemented this proposal only in C++. The
> >    implementation is already merged into apache/arrow. Should
> >    we have one more implementation like format specification
> >    change?
> >
> > http://crossbow.voltrondata.com/pr_docs/43553/format/Changing.html#at-least-two-reference-implementations
> >
> > Sorry to be that guy, but I would prefer having one more implementation as
> > this qualifies as a format change IMHO. So I'm +0 (binding) on this without
> > a second implementation. I won't oppose it getting merged, but that would
> > be my preference.
> >
> > On Thu, Dec 5, 2024 at 12:48 AM Gang Wu <ust...@gmail.com> wrote:
> >
> >> +1 (binding)
> >>
> >> I've left a minor comment to solicit concrete examples of data
> >> in the statistics array if this is reasonable.
> >>
> >> Best,
> >> Gang
> >>
> >> On Thu, Dec 5, 2024 at 11:17 AM wish maple <maplewish...@gmail.com> wrote:
> >>
> >>> +1 (non-binding)
> >>>
> >>> Best,
> >>> Xuwei Fu
> >>>
> >>> Sutou Kouhei <k...@clear-code.com> 于2024年12月5日周四 10:58写道:
> >>>
> >>>> Hi,
> >>>>
> >>>> I would like to propose standardizing how to pas statistics
> >>>> through the C data interface.
> >>>>
> >>>> Motivation:
> >>>>
> >>>> * We want to pass not only Apache Arrow data but also
> >>>>    statistics of them through the C data interface for query
> >>>>    planning.
> >>>>
> >>>> Approach:
> >>>>
> >>>> * Define a standardized schema for statistics.
> >>>> * Represent statistics as an Apache Arrow array that uses
> >>>>    the schema.
> >>>> * Pass the statistics Apache Arrow array through the C data
> >>>>    interface like a normal Apache Arrow array.
> >>>>
> >>>> Note that we don't define a new interface for statistics. We
> >>>> just use the existing C data interface. A statistics Apache
> >>>> Arrow array is passed through a separated API call.
> >>>>
> >>>> See also:
> >>>>
> >>>> * The discussion of this:
> >>>>    https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx
> >>>> * The PR of this proposal that includes the statistics
> >>>>    schema definition:
> >>>>    https://github.com/apache/arrow/pull/43553
> >>>> * The preview URL of the PR:
> >>>>
> >>>>
> >>>
> >> http://crossbow.voltrondata.com/pr_docs/43553/format/CDataInterfaceStatistics.html
> >>>>
> >>>> Note:
> >>>>
> >>>> * I implemented this proposal only in C++. The
> >>>>    implementation is already merged into apache/arrow. Should
> >>>>    we have one more implementation like format specification
> >>>>    change?
> >>>>
> >>>>
> >>>
> >> http://crossbow.voltrondata.com/pr_docs/43553/format/Changing.html#at-least-two-reference-implementations
> >>>>
> >>>>
> >>>> The vote will be open for at least 72 hours.
> >>>>
> >>>> [ ] +1 Accept this proposal
> >>>> [ ] +0
> >>>> [ ] -1 Do not accept this proposal because...
> >>>>
> >>>>
> >>>> Thanks,
> >>>> --
> >>>> kou
> >>>>
> >>>
> >>
> >
>

Reply via email to