Hi Samrose, Thanks for the proposal. +1 from my side as Iceberg should definitely leverage all info provided by Parquet. This can help in query planning (specially as the Join and exchange happens with raw data).
I have also tagged Micah on the proposal as he worked on the same at Parquet side. Note: Iceberg currently uses parquet 1.13.1 which depends on <https://github.com/apache/parquet-java/blob/apache-parquet-1.13.1/pom.xml#L74> parquet-format-2.9.0 *.*So, we need to bump the parquet version to 1.14.1 which uses the parquet-format-2.10.0 to leverage these stats. Fokko has an open PR for this. But it has some blockers ( https://github.com/apache/iceberg/pull/10209) - Ajantha On Mon, Jul 15, 2024 at 1:54 PM Samrose Ahmed <samroseah...@gmail.com> wrote: > Hello, > > I have added a proposal to be able to optionally track uncompressed > unencoded column size statistics for variable length columns. Currently, it > isn't possible to estimate memory size of variable length columns as > `columnSizes` only contains compressed sizes. > > I've created an issue (https://github.com/apache/iceberg/issues/10703) > and a document ( > https://docs.google.com/document/d/189kIZxx_dUloBCDPUz2Fh0BBOZSm2fXHHXWpdpq3DrU), > would appreciate any feedback. > > Thanks, > Samrose >