It’s not strictly statistics, but would this also cover constraints and
indexes? Table, recordbatch and column primary keys, unique keys, sort
keys, bloom filters, hnsw index and shape (ndarray for keys xyz).
Not sure which backends (DB, parquet, lance) expose which natively, but
might worth consi
Hi,
In
"Re: [DISCUSS] Statistics through the C data interface" on Sun, 9 Jun 2024
22:11:54 +0200,
Antoine Pitrou wrote:
Fields:
| Name | Type | Comments |
||---| |
| column | utf8
Le 09/06/2024 à 08:33, Sutou Kouhei a écrit :
Fields:
| Name | Type | Comments |
||---| |
| column | utf8 | (2) |
| key| utf8 not null | (3) |
1. Should the key be
Le 09/06/2024 à 09:01, Sutou Kouhei a écrit :
Hi,
One thing that a plain integer makes more difficult is representing
non-standard statistics. For example some engine might want to expose
elaborate quantile-based statistics even if it not officially defined
here. With a `utf8` or `dictionary(
Hi,
Thanks for your comment.
You may misunderstand my motivation.
This proposal doesn't change the Apache Arrow columnar
format. For example, this proposal doesn't save statistics
read from Apache Parquet file to Apache Arrow IPC file. This
proposal just attaches statistics read from Apache Parq
Hi,
OK. I'll propose arrow::ArrayStatistics API that can be used
as a starting point.
Thanks,
--
kou
In
"Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?" on Wed, 5 Jun
2024 22:55:25 -0700,
Micah Kornfield wrote:
> Generally I think this is a good idea that has been proposed
Hi,
+1 based on the benchmark results.
Questions:
1. Do we need to keep jemalloc support? Compatibility? Can we
drop support for jemalloc to decrease maintenance cost?
2. Is it OK that we add support for system mimalloc?
We always use vendored mimalloc for now:
https://github.com/apach
Hi,
> One thing that a plain integer makes more difficult is representing
> non-standard statistics. For example some engine might want to expose
> elaborate quantile-based statistics even if it not officially defined
> here. With a `utf8` or `dictionary(int32, utf8)` field, that is quite
> easy w