With apologies for not reviewing this earlier, I've reviewed it now! I am +0 in its current state just because of the title (the proposal seems to be about abstract arrays and not necessarily the C data interface except for the title). Happy to keep up with reviews to get this merged soon!
Cheers, -dewey On Thu, Dec 5, 2024 at 11:03 AM Antoine Pitrou <anto...@python.org> wrote: > > > I don't think a second implementation is strictly necessary because this > is just defining a schema and some conventions around it. Though of > course a second implementation is always better to have. > > Regards > > Antoine. > > > Le 05/12/2024 à 17:47, Matt Topol a écrit : > >> * I implemented this proposal only in C++. The > > implementation is already merged into apache/arrow. Should > > we have one more implementation like format specification > > change? > > > > http://crossbow.voltrondata.com/pr_docs/43553/format/Changing.html#at-least-two-reference-implementations > > > > Sorry to be that guy, but I would prefer having one more implementation as > > this qualifies as a format change IMHO. So I'm +0 (binding) on this without > > a second implementation. I won't oppose it getting merged, but that would > > be my preference. > > > > On Thu, Dec 5, 2024 at 12:48 AM Gang Wu <ust...@gmail.com> wrote: > > > >> +1 (binding) > >> > >> I've left a minor comment to solicit concrete examples of data > >> in the statistics array if this is reasonable. > >> > >> Best, > >> Gang > >> > >> On Thu, Dec 5, 2024 at 11:17 AM wish maple <maplewish...@gmail.com> wrote: > >> > >>> +1 (non-binding) > >>> > >>> Best, > >>> Xuwei Fu > >>> > >>> Sutou Kouhei <k...@clear-code.com> 于2024年12月5日周四 10:58写道: > >>> > >>>> Hi, > >>>> > >>>> I would like to propose standardizing how to pas statistics > >>>> through the C data interface. > >>>> > >>>> Motivation: > >>>> > >>>> * We want to pass not only Apache Arrow data but also > >>>> statistics of them through the C data interface for query > >>>> planning. > >>>> > >>>> Approach: > >>>> > >>>> * Define a standardized schema for statistics. > >>>> * Represent statistics as an Apache Arrow array that uses > >>>> the schema. > >>>> * Pass the statistics Apache Arrow array through the C data > >>>> interface like a normal Apache Arrow array. > >>>> > >>>> Note that we don't define a new interface for statistics. We > >>>> just use the existing C data interface. A statistics Apache > >>>> Arrow array is passed through a separated API call. > >>>> > >>>> See also: > >>>> > >>>> * The discussion of this: > >>>> https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx > >>>> * The PR of this proposal that includes the statistics > >>>> schema definition: > >>>> https://github.com/apache/arrow/pull/43553 > >>>> * The preview URL of the PR: > >>>> > >>>> > >>> > >> http://crossbow.voltrondata.com/pr_docs/43553/format/CDataInterfaceStatistics.html > >>>> > >>>> Note: > >>>> > >>>> * I implemented this proposal only in C++. The > >>>> implementation is already merged into apache/arrow. Should > >>>> we have one more implementation like format specification > >>>> change? > >>>> > >>>> > >>> > >> http://crossbow.voltrondata.com/pr_docs/43553/format/Changing.html#at-least-two-reference-implementations > >>>> > >>>> > >>>> The vote will be open for at least 72 hours. > >>>> > >>>> [ ] +1 Accept this proposal > >>>> [ ] +0 > >>>> [ ] -1 Do not accept this proposal because... > >>>> > >>>> > >>>> Thanks, > >>>> -- > >>>> kou > >>>> > >>> > >> > > >