All we have to do is add the parquet module as a test dependency, working
on a poc now. I don't think we really need to block on any other projects
although I'll probably hold off on any work on manifest-list since I hope
it won't be needed.

On Thu, May 29, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> I am interested in working on this proposal.
> I would assume it is to use `InternalData` with the format as
> `parquet`. But the challenge will be the test cases, the core module cannot
> write the parquet metadata due to circular dependency. We need to abstract
> out the test cases in the core module and run them from the parquet module
> I guess.
>
> I can work on a design doc as well. So, add me as a collaborator for the
> document.
> But should this work be done after we complete the work on "single file
> commit in v4" ? because metadata structure can change?
>
> - Ajantha
>
> On Thu, May 29, 2025 at 11:37 PM Russell Spitzer <
> russell.spit...@gmail.com> wrote:
>
>> Hi Y'all
>>
>> As discussed in the last community sync, we are beginning to gather up
>> folks who are interested in various efforts for Iceberg V4. To that end,
>> I'd like to use this thread as a gathering point for folks interested in
>> the metadata file format shift to Parquet. I wrote a quick abstract to
>> describe the purpose of this group.
>>
>> Following this I'll be working on a full design document or if someone
>> has one in prod please let us know and we can start discussing/working on
>> it there.
>>
>> *Abstract: Parquet as Metadata File Format*
>>
>> Currently the Iceberg SDK and Spec use Avro file format files for all
>> Manifest Lists and Manifests. The row oriented format was selected
>> because it was assumed that most metadata would be read in its entirety.
>> This has turned out to seldom be the case and the ability to read
>> single elements of the metrics would be very useful for query planning.
>> To address this we propose switching the underlying manifest format
>> from Avro to Parquet. In V4, Avro files would still be readable but all
>> new metadata files would be written in Parquet instead of Avro.
>>
>

Reply via email to