Hi everyone, Thanks for all of the actionable, relevant feedback on the PR ( https://github.com/apache/iceberg/pull/12298). Updated the code to address most of them. Please check if you agree with the general approach. If there is a consensus about the general approach, I could. separate out the PR to smaller pieces so we can have an easier time to review and merge those step-by-step. Thanks, Peter
Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta (időpont: 2025. febr. 20., Cs, 14:14): > Hi Peter > > sorry for the late reply on this. > > I did a pass on the proposal, it's very interesting and well written. > I like the DataFile API and definitely worth to discuss all together. > > Maybe we can schedule a specific meeting to discuss about DataFile API ? > > Thoughts ? > > Regards > JB > > On Tue, Feb 11, 2025 at 5:46 PM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > > > > Hi Team, > > > > As mentioned earlier on our Community Sync I am exploring the > possibility to define a FileFormat API for accessing different file > formats. I have put together a proposal based on my findings. > > > > ------------------- > > Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. > With the introduction of Iceberg V3 specification many new features are > added to Iceberg. Some of these features like new column types, default > values require changes at the file format level. The changes are added by > individual developers with different focus on the different file formats. > As a result not all of the features are available for every supported file > format. > > Also there are emerging file formats like Vortex [1] or Lance [2] which > either by specialization, or by applying newer research results could > provide better alternatives for certain use-cases like random access for > data, or storing ML models. > > ------------------- > > > > Please check the detailed proposal [3] and the google document [4], and > comment there or reply on the dev list if you have any suggestions. > > > > Thanks, > > Peter > > > > [1] - https://github.com/spiraldb/vortex > > [2] - https://lancedb.github.io/lance/ > > [3] - https://github.com/apache/iceberg/issues/12225 > > [4] - > https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds > > >