RE: Iceberg table maintenance

wenjin Sat, 30 Mar 2024 02:52:38 -0700

Hi Peter,

I am interested in your proposal and think make iceberg Flink Connector support 
running maintenance task is meaningful . If possible, could you help me clarify 
a few confusions.


- When the iceberg table is written by single Flink job (use case1, 2)，the 
maintenance tasks will be added to the post commit topology. How dose the 
maintenance tasks execute? Synchronously or Asynchronously? Will the 
maintenance tasks block the data processing of Flink job?

- When the iceberg table is written by multi Flink jobs (use case 3), user need 
create a separate Flink job to run the maintenance task. In this case, if user 
do not create a single job, but enable run maintenance task in exist Flink jobs 
just like use case 1, what would be the consequences? Or, is there an automatic 
mechanism to avoid this issue？

Thank you.

Best,
Wenjin

On 2024/03/28 17:59:49 Péter Váry wrote:
> Hi Team,
> 
> I am working on adding a possibility to the Flink Iceberg connector to run
> maintenance tasks on the Iceberg tables. This will fix the small files
> issues and in the long run help compacting the high number of positional
> and equality deletes created by Flink tasks writing CDC data to Iceberg
> tables without the need of Spark in the infrastructure.
> 
> I did some planning, prototyping and currently trying out the solution on a
> larger scale.
> 
> I put together a document how my current solution looks like:
> https://docs.google.com/document/d/16g3vR18mVBy8jbFaLjf2JwAANuYOmIwr15yDDxovdnA/edit?usp=sharing
>  
> <https://docs.google.com/document/d/16g3vR18mVBy8jbFaLjf2JwAANuYOmIwr15yDDxovdnA/edit?usp=sharing>
> 
> I would love to hear your thoughts and feedback on this to find a good
> final solution.
> 
> Thanks,
> Peter
>

RE: Iceberg table maintenance

Reply via email to