Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-05 Thread Jorge Cardoso Leitão
Hi This is c++ specific, but imo the question applies more broadly. I understood that the rationale for stats in compressed+encoded formats like parquet is that computing those stats has a high cost (io + decompress + decode + aggregate). This motivates the materialization of aggregates. In arro

Re: [DISCUSS][C++] How about adding arrow::ArrayStatistics?

2024-06-05 Thread Micah Kornfield
Generally I think this is a good idea that has been proposed before but I don't think we could ever make progress on design. On Sun, Jun 2, 2024 at 7:17 PM Sutou Kouhei wrote: > Hi, > > Related GitHub issue: > https://github.com/apache/arrow/issues/41909 > > How about adding arrow::ArrayStatisti

Re: [Discuss][C++] Switch to mimalloc by default?

2024-06-05 Thread Anja
I did want to start off by acknowledging that all of the pros you listed for mimalloc are accurate. I did want to contribute the times that people have been caught off-guard by the perceived increased memory allocation of mimalloc compared to the alternatives: E.g. https://github.com/microsoft/mim

[Discuss][C++] Switch to mimalloc by default?

2024-06-05 Thread Antoine Pitrou
Hello, Arrow C++ features a MemoryPool abstraction that allows using different allocators interchangeably. Several MemoryPool implementations are provided with Arrow C++ (though one can also build their own): - a jemalloc-based implementation, currently the default on Linux - a mimalloc-bas