Hi everyone, I would like to encourage everybody who wants to participate in the discussion of the topic to share their thoughts either on the doc, or on the PRs. I would like to finalize, merge the API in 1.10, so we can merge the implementation early 1.11. This would allow more throughout testing of the migration. Also this would allow us to remove the deprecated methods/classes in 2.0.
Pushing the vote/merge further out would force us to keep a sizable deprecated code even after 2.0. I think it would be one nice feature for the 2.0 to remove the old convoluted file access code paths. Thanks, Peter Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. máj. 16., P, 10:51): > Thanks Ryan for your support! I know when you have time, you will check > this proposal as well. > > I understand that V3 is important. I’ve been following the votes and PRs > and can see the great progress being made. Thanks to everyone contributing! > Please feel free to reach out if I can help with reviews or PRs. I'm here > and happy to support. > > It seems that those who reviewed the PRs and the document over the past > four months have reached consensus. That said, I agree with you, that it > would be great to involve more people. > I intentionally didn’t set an end date for the vote, as I planned to bring > it up during our next community sync. That way, we can reach a broader > audience and encourage wider participation. > > Thanks again for taking the time to respond. I’m looking forward to your > feedback and comments. > > Peter > > > Ryan Blue <rdb...@gmail.com> ezt írta (időpont: 2025. máj. 15., Cs, > 23:17): > >> I definitely support introducing an API for this purpose and I think that >> the current work is the right direction. But I'm not sure that a vote is >> the right next step. A vote should be used to confirm consensus on a design >> and direction, and I thought the next steps were to build that consensus >> around the current API prototype. >> >> I know that a few of us have been trying to tie up loose ends with v3 and >> that has meant, for me at least, that I haven't had time to thoroughly >> review the last set of changes and the file API proposed. Isn't the next >> step to get all reviewers to spend time on this area rather than to try to >> decide with a vote? I'll make sure I take the time in the next few weeks to >> help with the reviews. I think that building consensus is the right next >> step. >> >> Ryan >> >> On Thu, May 15, 2025 at 6:46 AM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> Hi Team, >>> >>> We started the discussion of the File Format API proposal [1] a long >>> time ago [2]. >>> Since then - during the review process - we moved from single >>> formalization of the similar APIs to bigger changes. >>> The lucky ones could see a presentation about the results during the >>> Iceberg Summit [3]. The topic was discussed, and generally endorsed during >>> "The Future of Apache Iceberg" panel discussion. [4] >>> >>> The new API still uses direct conversion from the Data File object model >>> to the Engine object model, but refactors out many duplicated code parts >>> from both the File Format and the engine specific codes. >>> As a result we get: >>> - Same performance as the current solution >>> - Formalized API >>> - Simplified code >>> - On engine level >>> - On File Format level >>> - Improved testability >>> - Ability to introduce/deprecate new File Formats without much >>> disruption when the community decides so. >>> >>> Proposal document contains more details [5], also there is a PR where >>> you can check the proposed API changes[6], and a bigger change showing how >>> the new API would affect the current File Format and engine implementations >>> [7]. >>> >>> Please consider the proposal and vote. >>> >>> [ ] +1 Add these changes to Iceberg >>> [ ] +0 >>> [ ] -1 I have questions and/or concerns >>> >>> Thanks, >>> Peter >>> >>> [1] - Github issue - https://github.com/apache/iceberg/issues/12225 >>> [2] - Mail list thread - >>> https://lists.apache.org/thread/ovyh52m2b6c1hrg4fhw3rx92bzr793n2 >>> [3] - Turbocharge Queries on Iceberg with Next-Gen File Formats - >>> https://www.youtube.com/watch?v=p6ZKY8JViCA&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=40 >>> [4] - The Future of Apache Iceberg™: A Community Member Panel Discussion >>> - >>> https://www.youtube.com/watch?v=BTTxeUXjqk8&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=6 >>> [5] - Proposal document - >>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds >>> [6] - PR: Core, Data: File Format API interfaces - >>> https://github.com/apache/iceberg/pull/12774 >>> [7] - PR: Core: Interface based DataFile reader and writer API - >>> https://github.com/apache/iceberg/pull/12298 >>> >>>