Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Sutou Kouhei
Hi, > Why not simply pass the statistics ArrowArray separately in your > producer API of choice It seems that we should use the approach because all feedback said so. How about the following schema for the statistics ArrowArray? It's based on ADBC. | Field Name | Field Type

Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Sutou Kouhei
Hi, I agree with the proposed approach is a departure use of ArrowSchema. ADBC may be a bit larger to use only for transmitting statistics. ADBC has statistics related APIs but it has more other APIs. > It is also not the first time it has come up to encode > data-dependent information

Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Sutou Kouhei
Hi, > One potential challenge with encoding statistics in the schema > metadata is that some systems may consider this metadata as part of > assessing schema equivalence. It's a good point. I didn't notice it. The proposed approach makes schemas different because they have addresses of ArrowArray

Re: [VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread David Li
+1 (binding) Tested on Debian 12 'bookworm' On Thu, May 23, 2024, at 11:03, Sutou Kouhei wrote: > +1 (binding) > > I ran the following command line on Debian GNU/Linux sid: > > dev/release/verify-release-candidate.sh 0.5.0 0 > > with: > > * Apache Arrow C++ main > * gcc (Debian 13.2.0-23) 1

Re: [VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread Sutou Kouhei
+1 (binding) I ran the following command line on Debian GNU/Linux sid: dev/release/verify-release-candidate.sh 0.5.0 0 with: * Apache Arrow C++ main * gcc (Debian 13.2.0-23) 13.2.0 * R version 4.3.3 (2024-02-29) -- "Angel Food Cake" * Python 3.11.9 Thanks, -- kou In "[VOTE] Rel

Re: [VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread Dane Pitkin
+1 (non-binding) Verified on MacOS 14 aarch64. On Wed, May 22, 2024 at 2:55 PM Bryce Mecum wrote: > +1 (non-binding) > > Verified on: > > - macOS aarch64 > - Debian 12 x86_64 inside a conda environment (note I had to install > Python 3.11 separately from the instructions, not sure I missed a >

Re: [VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread Bryce Mecum
+1 (non-binding) Verified on: - macOS aarch64 - Debian 12 x86_64 inside a conda environment (note I had to install Python 3.11 separately from the instructions, not sure I missed a step) On Wed, May 22, 2024 at 10:18 AM Dewey Dunnington wrote: > > Hello, > > I would like to propose the followin

[VOTE] Release Apache Arrow nanoarrow 0.5.0

2024-05-22 Thread Dewey Dunnington
Hello, I would like to propose the following release candidate (rc0) of Apache Arrow nanoarrow [0] version 0.5.0. This is an initial release consisting of 79 resolved GitHub issues from 9 contributors [1]. This release candidate is based on commit: c5fb10035c17b598e6fd688ad9eb7b874c7c631b [2] Th

Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Antoine Pitrou
Hi Kou, I agree that Dewey that this is overstretching the capabilities of the C Data Interface. In particular, stuffing a pointer as metadata value and decreeing it immortal doesn't sound like a good design decision. Why not simply pass the statistics ArrowArray separately in your produce

Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Dewey Dunnington
I am definitely in favor of adding (or adopting an existing) ABI-stable way to transmit statistics (the one that comes up most frequently for me is just the number of values that are about to show up in an ArrowArrayStream, since the producer often knows this and the consumer often would like to pr

Re: [DISCUSS] Statistics through the C data interface

2024-05-22 Thread Raphael Taylor-Davies
Hi, One potential challenge with encoding statistics in the schema metadata is that some systems may consider this metadata as part of assessing schema equivalence. However, I think the bigger question is what the intended use-case for these statistics is? Often query engines want to collect