Hi Peter

sorry for the late reply on this.

I did a pass on the proposal, it's very interesting and well written.
I like the DataFile API and definitely worth to discuss all together.

Maybe we can schedule a specific meeting to discuss about DataFile API ?

Thoughts ?

Regards
JB

On Tue, Feb 11, 2025 at 5:46 PM Péter Váry <peter.vary.apa...@gmail.com> wrote:
>
> Hi Team,
>
> As mentioned earlier on our Community Sync I am exploring the possibility to 
> define a FileFormat API for accessing different file formats. I have put 
> together a proposal based on my findings.
>
> -------------------
> Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. With 
> the introduction of Iceberg V3 specification many new features are added to 
> Iceberg. Some of these features like new column types, default values require 
> changes at the file format level. The changes are added by individual 
> developers with different focus on the different file formats. As a result 
> not all of the features are available for every supported file format.
> Also there are emerging file formats like Vortex [1] or Lance [2] which 
> either by specialization, or by applying newer research results could provide 
> better alternatives for certain use-cases like random access for data, or 
> storing ML models.
> -------------------
>
> Please check the detailed proposal [3] and the google document [4], and 
> comment there or reply on the dev list if you have any suggestions.
>
> Thanks,
> Peter
>
> [1] - https://github.com/spiraldb/vortex
> [2] - https://lancedb.github.io/lance/
> [3] - https://github.com/apache/iceberg/issues/12225
> [4] - 
> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
>

Reply via email to