Hi,

I would like to propose standardizing how to represent
statistics as Apache Arrow array.

Motivation:

* We want to pass not only Apache Arrow data but also
  statistics of them through the C data interface for query
  planning.

Approach:

* Define a standardized schema for statistics.
* Represent statistics as an Apache Arrow array that uses
  the schema.
* Pass the statistics Apache Arrow array through the C data
  interface like a normal Apache Arrow array.

Note that we don't define a new interface for statistics. We
just use the existing C data interface. A statistics Apache
Arrow array is passed through a separated API call.

Note that this proposal doesn't define anything about how or
where to use it. The above example just shows one use-case.

This is based on the previous rejected vote discussion:
https://lists.apache.org/thread/rsw3wsyj68dksc98s5rpdp6dn8hfk0yd

See also:

* The discussion of this:
  https://lists.apache.org/thread/b6chzlyn95rztoybs39b6olz907g12gj
* The PR of this proposal:
  https://github.com/apache/arrow/pull/45058
* The preview URL of the PR:
  http://crossbow.voltrondata.com/pr_docs/45058/format/StatisticsSchema.html


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Thanks,
-- 
kou

Reply via email to