All we have to do is add the parquet module as a test dependency, working on a poc now. I don't think we really need to block on any other projects although I'll probably hold off on any work on manifest-list since I hope it won't be needed.
On Thu, May 29, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > I am interested in working on this proposal. > I would assume it is to use `InternalData` with the format as > `parquet`. But the challenge will be the test cases, the core module cannot > write the parquet metadata due to circular dependency. We need to abstract > out the test cases in the core module and run them from the parquet module > I guess. > > I can work on a design doc as well. So, add me as a collaborator for the > document. > But should this work be done after we complete the work on "single file > commit in v4" ? because metadata structure can change? > > - Ajantha > > On Thu, May 29, 2025 at 11:37 PM Russell Spitzer < > russell.spit...@gmail.com> wrote: > >> Hi Y'all >> >> As discussed in the last community sync, we are beginning to gather up >> folks who are interested in various efforts for Iceberg V4. To that end, >> I'd like to use this thread as a gathering point for folks interested in >> the metadata file format shift to Parquet. I wrote a quick abstract to >> describe the purpose of this group. >> >> Following this I'll be working on a full design document or if someone >> has one in prod please let us know and we can start discussing/working on >> it there. >> >> *Abstract: Parquet as Metadata File Format* >> >> Currently the Iceberg SDK and Spec use Avro file format files for all >> Manifest Lists and Manifests. The row oriented format was selected >> because it was assumed that most metadata would be read in its entirety. >> This has turned out to seldom be the case and the ability to read >> single elements of the metrics would be very useful for query planning. >> To address this we propose switching the underlying manifest format >> from Avro to Parquet. In V4, Avro files would still be readable but all >> new metadata files would be written in Parquet instead of Avro. >> >