Re: [DISCUSS] Statistics through the C data interface

2024-06-09 Thread Adam Lippai
It’s not strictly statistics, but would this also cover constraints and indexes? Table, recordbatch and column primary keys, unique keys, sort keys, bloom filters, hnsw index and shape (ndarray for keys xyz). Not sure which backends (DB, parquet, lance) expose which natively, but might worth consi

Re: [DISCUSS] Statistics through the C data interface

2024-06-09 Thread Sutou Kouhei
Hi, In "Re: [DISCUSS] Statistics through the C data interface" on Sun, 9 Jun 2024 22:11:54 +0200, Antoine Pitrou wrote: Fields: | Name | Type | Comments | ||---| | | column | utf8

Re: [DISCUSS] Statistics through the C data interface

2024-06-09 Thread Antoine Pitrou
Le 09/06/2024 à 08:33, Sutou Kouhei a écrit : Fields: | Name | Type | Comments | ||---| | | column | utf8 | (2) | | key| utf8 not null | (3) | 1. Should the key be

Re: [DISCUSS] Statistics through the C data interface

2024-06-09 Thread Antoine Pitrou
Le 09/06/2024 à 09:01, Sutou Kouhei a écrit : Hi, One thing that a plain integer makes more difficult is representing non-standard statistics. For example some engine might want to expose elaborate quantile-based statistics even if it not officially defined here. With a `utf8` or `dictionary(

Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-09 Thread Sutou Kouhei
Hi, Thanks for your comment. You may misunderstand my motivation. This proposal doesn't change the Apache Arrow columnar format. For example, this proposal doesn't save statistics read from Apache Parquet file to Apache Arrow IPC file. This proposal just attaches statistics read from Apache Parq

Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-09 Thread Sutou Kouhei
Hi, OK. I'll propose arrow::ArrayStatistics API that can be used as a starting point. Thanks, -- kou In "Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Wed, 5 Jun 2024 22:55:25 -0700, Micah Kornfield wrote: > Generally I think this is a good idea that has been proposed

Re: [Discuss][C++] Switch to mimalloc by default?

2024-06-09 Thread Sutou Kouhei
Hi, +1 based on the benchmark results. Questions: 1. Do we need to keep jemalloc support? Compatibility? Can we drop support for jemalloc to decrease maintenance cost? 2. Is it OK that we add support for system mimalloc? We always use vendored mimalloc for now: https://github.com/apach

Re: [DISCUSS] Statistics through the C data interface

2024-06-09 Thread Sutou Kouhei
Hi, > One thing that a plain integer makes more difficult is representing > non-standard statistics. For example some engine might want to expose > elaborate quantile-based statistics even if it not officially defined > here. With a `utf8` or `dictionary(int32, utf8)` field, that is quite > easy w