Hi,

I want to discuss Arrow array representation of statistics
and usable contexts of it.

Background:

We discussed how to pass statistics through the C data
interface:

* [DISCUSS] Statistics through the C data interface
  https://lists.apache.org/thread/z0jz2bnv61j7c6lbk7lympdrs49f69cx
* [VOTE] Statistics through the C data interface
  https://lists.apache.org/thread/rsw3wsyj68dksc98s5rpdp6dn8hfk0yd
* GH-38837: [Format] Add the specification to pass
  statistics through the Arrow C data interface
  https://github.com/apache/arrow/pull/43553

The latest proposal is that we standardize schema for Arrow
array that represents statistics. See the above PR for
details.

I think that the proposed approach is the best approach for
the C data interface. But I'm not sure whether the approach
is the best approach for other contexts such as IPC format,
Flight, ADBC and so on. So the latest proposal limits its
target to only the C data interface.

But there are comments that can we standardize this approach
for all contexts including the C data interface?
I want to discuss this in this thread.

Here are related comments so far:

* https://github.com/apache/arrow/pull/43553/files#r1871749972
* https://github.com/apache/arrow/pull/43553/files#r1704373291
* https://github.com/apache/arrow/pull/43553/files#r1871757604


Could you share your opinions?


If we can remove the C data interface only limitation, I'll
open a new PR for it.


Thanks,
-- 
kou

Reply via email to