Re: [DISCUSS] FileFormat API proposal

Péter Váry Thu, 13 Mar 2025 05:17:04 -0700

Hi Team,
I have rebased the File Format API proposal (
https://github.com/apache/iceberg/pull/12298) to include the new changes
needed for the Variant types. I would love to hear your feedback,
especially Dan and Ryan, as you were the most active during our
discussions. If I can help in any way to make the review easier, please let
me know.
Thanks,
Peter


Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. febr.
28., P, 17:50):

> Hi everyone,
> Thanks for all of the actionable, relevant feedback on the PR (
> https://github.com/apache/iceberg/pull/12298).
> Updated the code to address most of them. Please check if you agree with
> the general approach.
> If there is a consensus about the general approach, I could. separate out
> the PR to smaller pieces so we can have an easier time to review and merge
> those step-by-step.
> Thanks,
> Peter
>
> Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta (időpont: 2025. febr.
> 20., Cs, 14:14):
>
>> Hi Peter
>>
>> sorry for the late reply on this.
>>
>> I did a pass on the proposal, it's very interesting and well written.
>> I like the DataFile API and definitely worth to discuss all together.
>>
>> Maybe we can schedule a specific meeting to discuss about DataFile API ?
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>> On Tue, Feb 11, 2025 at 5:46 PM Péter Váry <peter.vary.apa...@gmail.com>
>> wrote:
>> >
>> > Hi Team,
>> >
>> > As mentioned earlier on our Community Sync I am exploring the
>> possibility to define a FileFormat API for accessing different file
>> formats. I have put together a proposal based on my findings.
>> >
>> > -------------------
>> > Iceberg currently supports 3 different file formats: Avro, Parquet,
>> ORC. With the introduction of Iceberg V3 specification many new features
>> are added to Iceberg. Some of these features like new column types, default
>> values require changes at the file format level. The changes are added by
>> individual developers with different focus on the different file formats.
>> As a result not all of the features are available for every supported file
>> format.
>> > Also there are emerging file formats like Vortex [1] or Lance [2] which
>> either by specialization, or by applying newer research results could
>> provide better alternatives for certain use-cases like random access for
>> data, or storing ML models.
>> > -------------------
>> >
>> > Please check the detailed proposal [3] and the google document [4], and
>> comment there or reply on the dev list if you have any suggestions.
>> >
>> > Thanks,
>> > Peter
>> >
>> > [1] - https://github.com/spiraldb/vortex
>> > [2] - https://lancedb.github.io/lance/
>> > [3] - https://github.com/apache/iceberg/issues/12225
>> > [4] -
>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
>> >
>>
>

Re: [DISCUSS] FileFormat API proposal

Reply via email to