Hi all, I was wondering if anyone could elaborate on why the default maximum row group length is set to 67108864<https://github.com/apache/arrow/blob/5c936560c1da003baf714d67dc92f25670730c84/cpp/src/parquet/properties.h#L97>. From Apache Parquet's documentation, the recommended row group size is between 512 MB and 1 GB.<https://parquet.apache.org/documentation/latest/> For a Float64Array whose length is 67108864, I believe its size would be approximately 545 MB, which is on the low end of that interval.
I was wondering if there was a particular reason why 67108864 was chosen as the maximum row group length. I experimented with setting the default maximum row group length to larger values and noticed pyarrow cannot import Parquet files containing row groups whose lengths exceed 2147483647 rows (int32 max). However, I was able to read these files in using the C++ Arrow bindings. Best, Sarah