Hi Team, I have rebased the File Format API proposal ( https://github.com/apache/iceberg/pull/12298) to include the new changes needed for the Variant types. I would love to hear your feedback, especially Dan and Ryan, as you were the most active during our discussions. If I can help in any way to make the review easier, please let me know. Thanks, Peter
Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. febr. 28., P, 17:50): > Hi everyone, > Thanks for all of the actionable, relevant feedback on the PR ( > https://github.com/apache/iceberg/pull/12298). > Updated the code to address most of them. Please check if you agree with > the general approach. > If there is a consensus about the general approach, I could. separate out > the PR to smaller pieces so we can have an easier time to review and merge > those step-by-step. > Thanks, > Peter > > Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta (időpont: 2025. febr. > 20., Cs, 14:14): > >> Hi Peter >> >> sorry for the late reply on this. >> >> I did a pass on the proposal, it's very interesting and well written. >> I like the DataFile API and definitely worth to discuss all together. >> >> Maybe we can schedule a specific meeting to discuss about DataFile API ? >> >> Thoughts ? >> >> Regards >> JB >> >> On Tue, Feb 11, 2025 at 5:46 PM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> > >> > Hi Team, >> > >> > As mentioned earlier on our Community Sync I am exploring the >> possibility to define a FileFormat API for accessing different file >> formats. I have put together a proposal based on my findings. >> > >> > ------------------- >> > Iceberg currently supports 3 different file formats: Avro, Parquet, >> ORC. With the introduction of Iceberg V3 specification many new features >> are added to Iceberg. Some of these features like new column types, default >> values require changes at the file format level. The changes are added by >> individual developers with different focus on the different file formats. >> As a result not all of the features are available for every supported file >> format. >> > Also there are emerging file formats like Vortex [1] or Lance [2] which >> either by specialization, or by applying newer research results could >> provide better alternatives for certain use-cases like random access for >> data, or storing ML models. >> > ------------------- >> > >> > Please check the detailed proposal [3] and the google document [4], and >> comment there or reply on the dev list if you have any suggestions. >> > >> > Thanks, >> > Peter >> > >> > [1] - https://github.com/spiraldb/vortex >> > [2] - https://lancedb.github.io/lance/ >> > [3] - https://github.com/apache/iceberg/issues/12225 >> > [4] - >> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds >> > >> >