I propose posting a blog article by myself, Daniƫl Heres, and Raphael Tustvold-Davies wrote about datafusion grouping performance [1]
This content was originally published on the InfluxData blog[2] but we would like to repost it on the Arrow site blog [3], as the content is general and other reasons described on the PR. This is the same pattern we followed for [4], which seems to have been successful and uncontroversial. Please let us know your thoughts either here or by commenting on the PR Andrew [1]: https://github.com/apache/arrow-site/pull/386 [2]: https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/ [3]: https://arrow.apache.org/blog/ [4]: https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/