Re: FileRewrite API refactor

2025-02-18 Thread Péter Váry
Thanks for all the feedback! Create the PR for the API discussion: https://github.com/apache/iceberg/pull/12306 Thanks, Peter Steven Wu ezt írta (időpont: 2025. febr. 13., Cs, 22:28): > looking at "RewriteDataFilesSparkAction" from your PR #11513, I am fine > that the RewriteExecutionContext i

Re: FileRewrite API refactor

2025-02-13 Thread Steven Wu
looking at "RewriteDataFilesSparkAction" from your PR #11513, I am fine that the RewriteExecutionContext is captured in the `Plan` object. My earlier point is that we need to pass those common metadata/context to the executor. We don't have to define a separate `PlanInfo` for that purpose if they a

Re: FileRewrite API refactor

2025-02-07 Thread Péter Váry
Hi Steven, Thanks for checking this out! Created commits which contain only the API changes: 1. Everything is stored on the same level as in the current API: https://github.com/apache/iceberg/commit/8d612e074dcb8ee6d5ae354e329d0b78e3138c86 2. Simplified the API to push down everything

Re: FileRewrite API refactor

2025-02-06 Thread Steven Wu
To be honest, it is hard to visualize the interface/structure discussion in this format. More details (in a doc or PR) can be helpful. Regarding "data organization", I feel we probably can set one common metadata class for all action types. We don't need both RewriteFilePlanContext and RewritePosi

Re: FileRewrite API refactor

2025-02-05 Thread Péter Váry
Hi Steven, Thanks for chiming in! The decision points I have collected: - Data organization 1. Plan + Group 2. Group only - Parameter handling 1. All strings 2. Type params - Engine specific parameters for the executor 1. Common set calculated by the planne

Re: FileRewrite API refactor

2025-02-04 Thread Steven Wu
At a high level, it makes sense to separate out the planning and execution to promote reusing the planning code across engines. Just to add 4th class to Russel's list 1) RewriteGroup: A Container that holds all the files that are meant to be compacted along with information about them 2) Rewriter:

Re: FileRewrite API refactor

2025-02-01 Thread Russell Spitzer
We probably still have to support it as long as we have V2 Table support right? On Fri, Jan 31, 2025 at 9:13 AM Péter Váry wrote: > We could simplify the API a bit, if we omit DeleteFileRewrite. > Since Anton's work around the Puffin delete vectors, this will become > obsolete anyway, and focusi

Re: FileRewrite API refactor

2025-01-31 Thread Péter Váry
We could simplify the API a bit, if we omit DeleteFileRewrite. Since Anton's work around the Puffin delete vectors, this will become obsolete anyway, and focusing on data file rewriting would allow us to remove some generics from the API. WDYT? Russell Spitzer ezt írta (időpont: 2025. jan. 21.,

Re: FileRewrite API refactor

2025-01-21 Thread Russell Spitzer
To bump this back up, I think this is a pretty important change to the core library so it's necessary that we get more folks involved in this discussion. I I agree that the Rewrite Data Files needs to be broken up and realigned if we want to be able to reuuse the code in flink. I think I prefer t

FileRewrite API refactor

2025-01-14 Thread Péter Váry
Hi Team, There is ongoing work to bring Flink Table Maintenance to Iceberg [1]. We already merged the main infrastructure and are currently working on implementing the data file rewrite [2]. During the implementation we found that part of the compaction planning implemented for Spark compaction, c