I think the feedback is more along the lines of: we can just standardize a representation of statistics, without referencing where it's used (C Data Interface or otherwise). So people are free to use it wherever they want, whether C Data Interface or IPC or somewhere else. At the same time, we are not saying that we are going to (for example) embed this into IPC files as the official way to have Parquet-like statistics. It is simply an agreed upon schema for interoperability, and the details of how it is passed around are up to the application. (At least for now.)
Or in other words, we can just say, "this is the canonical schema to represent statistics about an Arrow dataset as Arrow data", without defining anything about how or where to use it. I think it's still useful to have context/examples of why we are motivated to define this (and note that these are examples only), which may use C Data Interface or something as an example, but others may disagree. On Thu, Dec 12, 2024, at 10:27, Sutou Kouhei wrote: > Hi, > > I want to discuss Arrow array representation of statistics > and usable contexts of it. > > Background: > > We discussed how to pass statistics through the C data > interface: > > * [DISCUSS] Statistics through the C data interface > https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx > * [VOTE] Statistics through the C data interface > https://lists.apache.org/thread/rsw3wsyj68dksc98s5rpdp6dn8hfk0yd > * GH-38837: [Format] Add the specification to pass > statistics through the Arrow C data interface > https://github.com/apache/arrow/pull/43553 > > The latest proposal is that we standardize schema for Arrow > array that represents statistics. See the above PR for > details. > > I think that the proposed approach is the best approach for > the C data interface. But I'm not sure whether the approach > is the best approach for other contexts such as IPC format, > Flight, ADBC and so on. So the latest proposal limits its > target to only the C data interface. > > But there are comments that can we standardize this approach > for all contexts including the C data interface? > I want to discuss this in this thread. > > Here are related comments so far: > > * https://github.com/apache/arrow/pull/43553/files#r1871749972 > * https://github.com/apache/arrow/pull/43553/files#r1704373291 > * https://github.com/apache/arrow/pull/43553/files#r1871757604 > > > Could you share your opinions? > > > If we can remove the C data interface only limitation, I'll > open a new PR for it. > > > Thanks, > -- > kou