Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread Gang Wu
IIUC, iceberg-parquet depends on iceberg-arrow for the vectored reader implementation (though partially supported). Should we relocate iceberg-arrow together? Since I have mentioned that the vectored reader implementation is partially supported, is it a direction that needs to be improved? There i

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread Renjie Liu
Hi: > Third, the use case has expanded from stats files to other metadata. I would +1 for this reason to move the parquet module to the core module. But I think the right direction is that we still keep interfaces in the core module so that data file format is pluggable. Moving parquet to core

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-18 Thread rdb...@gmail.com
I was the person that originally suggested that we not move iceberg-parquet into core, so it would probably help if I gave some context for my rationale as I remember it and what's changed since then. I pushed back on the original suggestion to move Parquet classes into core because it wasn't clea

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-09 Thread Ajantha Bhat
Thanks Dan for the reply. This is also a good time to consider adding a native parquet read/write > path for use in core as the generic path in 'iceberg-data' isn't ideal. > Parquet metadata has been brought up in relation to improving stats > handling (allowing tracking of more column stats witho

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-06 Thread Daniel Weeks
Hey Ajantha, I understand it was discussed before, but I think a lot of recent discussions around improvements for parquet metadata/stats/etc is good justification for revisiting the earlier discussion. Parquet metadata has been brought up in relation to improving stats handling (allowing trackin

Re: [DISCUSS] Relocate Parquet to Iceberg Core

2024-12-06 Thread Ajantha Bhat
Hi Dan, I proposed the same last year while working on partition stats. I can revive this PR if required, https://github.com/apache/iceberg/pull/8500 But we decided that `*iceberg-data`* can write these parquet stats files (metadata) and core can just register it. So, it is no longer needed for p

[DISCUSS] Relocate Parquet to Iceberg Core

2024-12-06 Thread Daniel Weeks
Everyone, I wanted to propose moving the parquet implementation from the 'iceberg-parquet' project to the 'iceberg-core' project. The original motivation for keeping these subprojects separate was due to Iceberg relying on avro (which is included in the core project) for metadata and keeping othe