Thanks for all the feedback!
Create the PR for the API discussion:
https://github.com/apache/iceberg/pull/12306
Thanks,
Peter
Steven Wu ezt írta (időpont: 2025. febr. 13., Cs,
22:28):
> looking at "RewriteDataFilesSparkAction" from your PR #11513, I am fine
> that the RewriteExecutionContext i
looking at "RewriteDataFilesSparkAction" from your PR #11513, I am fine
that the RewriteExecutionContext is captured in the `Plan` object. My
earlier point is that we need to pass those common metadata/context to the
executor. We don't have to define a separate `PlanInfo` for that purpose if
they a
Hi Steven,
Thanks for checking this out!
Created commits which contain only the API changes:
1. Everything is stored on the same level as in the current API:
https://github.com/apache/iceberg/commit/8d612e074dcb8ee6d5ae354e329d0b78e3138c86
2. Simplified the API to push down everything
To be honest, it is hard to visualize the interface/structure discussion in
this format. More details (in a doc or PR) can be helpful.
Regarding "data organization", I feel we probably can set one common
metadata class for all action types. We don't need both
RewriteFilePlanContext and RewritePosi
Hi Steven,
Thanks for chiming in!
The decision points I have collected:
- Data organization
1. Plan + Group
2. Group only
- Parameter handling
1. All strings
2. Type params
- Engine specific parameters for the executor
1. Common set calculated by the planne
At a high level, it makes sense to separate out the planning and execution
to promote reusing the planning code across engines.
Just to add 4th class to Russel's list
1) RewriteGroup: A Container that holds all the files that are meant to be
compacted along with information about them
2) Rewriter:
We probably still have to support it as long as we have V2 Table support
right?
On Fri, Jan 31, 2025 at 9:13 AM Péter Váry
wrote:
> We could simplify the API a bit, if we omit DeleteFileRewrite.
> Since Anton's work around the Puffin delete vectors, this will become
> obsolete anyway, and focusi
We could simplify the API a bit, if we omit DeleteFileRewrite.
Since Anton's work around the Puffin delete vectors, this will become
obsolete anyway, and focusing on data file rewriting would allow us to
remove some generics from the API.
WDYT?
Russell Spitzer ezt írta (időpont: 2025. jan.
21.,
To bump this back up, I think this is a pretty important change to the core
library so it's necessary that we get more folks involved in this
discussion. I
I agree that the Rewrite Data Files needs to be broken up and realigned if
we want to be able to reuuse the code in flink.
I think I prefer t
Hi Team,
There is ongoing work to bring Flink Table Maintenance to Iceberg [1]. We
already merged the main infrastructure and are currently working on
implementing the data file rewrite [2]. During the implementation we found
that part of the compaction planning implemented for Spark compaction,
c
10 matches
Mail list logo