Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Ryan Blue
If this is specific to solving the problem that there is no notification when a task finishes in Flink, then I think it makes sense to use a JDBC lock. I'd prefer that this not add the tag-based locking strategy because I think that has the potential to be misunderstood by people using the library

Re: Flink Table Maintenance - Tag based locking

2024-08-07 Thread Péter Váry
Hi Anton, nice to hear from you! Thanks Ryan for your continued interest! You can find my answers below: > Am I right to say the proposal has the following high-level goals: > - Perform cheap maintenance actions periodically after commits using the same cluster (suitable for things like rewriting

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Ryan Blue
> If nobody else has a better idea, then I will add a default JDBC based locking implementation to the PR Do you mean an implementation of `LockManager` in core or something specific to this Flink application? On Tue, Aug 6, 2024 at 2:28 AM Péter Váry wrote: > > > We can make sure that the Task

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Anton Okolnychyi
Took a look at the doc as well as this thread. Am I right to say the proposal has the following high-level goals: - Perform cheap maintenance actions periodically after commits using the same cluster (suitable for things like rewriting manifests, compacting tiny data files). - Offer an ability to

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Péter Váry
> If all the maintenance tasks are created from a single Flink job, is it possible to simply skip new maintenance task if there’s already running task? The running maintenance tasks could be recorded in the JM? The operator scheduling the tasks doesn't know when the actual tasks are finished. We n

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Xianjin YE
> DataFile rewrite will create a new manifest file. This means if a DataFile > rewrite task is finished and committed, and there is a concurrent > ManifestFile rewrite then the ManifestFile rewrite will fail. I have played > around with serializing the Maintenance Tasks (resulted in a very ugly/

Re: Flink Table Maintenance - Tag based locking

2024-08-06 Thread Péter Váry
> > We can make sure that the Tasks can tolerate concurrent runs, but as mentioned in the doc, in most cases having concurrent runs are a waste of resources, because of the commit conflicts. > > Is the problem that users may configure multiple jobs that are all trying to run maintenance procedures?

Re: Flink Table Maintenance - Tag based locking

2024-08-05 Thread Manu Zhang
Hi Peter, We rely on Airflow to schedule and coordinate maintenance Spark jobs. I agree with Ryan that an Iceberg solution is not a good choice here. Thanks, Manu On Tue, Aug 6, 2024 at 1:07 AM Ryan Blue wrote: > > We can make sure that the Tasks can tolerate concurrent runs, but as > mentione

Re: Flink Table Maintenance - Tag based locking

2024-08-05 Thread Ryan Blue
> We can make sure that the Tasks can tolerate concurrent runs, but as mentioned in the doc, in most cases having concurrent runs are a waste of resources, because of the commit conflicts. Is the problem that users may configure multiple jobs that are all trying to run maintenance procedures? If s

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Péter Váry
Thanks everyone for your answers! I really appreciate it, especially since these come into during the weekend, using your own time. @Manu, during our initial discussion, you have mentioned that you had similar issues with Spark compactions. You needed locking there. Is it still an issue? If Spark

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Manu Zhang
Not familiar with Flink, I'm wondering how Flink resolves concurrency issues in common Flink use cases. For example, how does Flink prevent two jobs from writing to the same file? On the other hand, an Iceberg tag is eventually an atomic change to a file. It's the same as using a file lock. I don'

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Steven Wu
I also don't feel it is the best fit to use tags to implement locks for passing control messages. This is the main sticking point for me from the design doc. However, we haven't been able to come up with a better solution yet. Maybe we need to go back to the drawing board again. I am also not sure

Re: Flink Table Maintenance - Tag based locking

2024-08-04 Thread Ryan Blue
Hi Péter, thanks for bringing this up. I don't think using a tag to "lock" a table is a good idea. The doc calls out that this is necessary "Since Flink doesn’t provide an out of the box solution for downstream operators sending feedback to upstream operators" so this feels like using Iceberg met

Flink Table Maintenance - Tag based locking

2024-07-31 Thread Péter Váry
Hi Team, During the discussion around the Flink Table Maintenance [1], [2], I have highlighted that one of the main decision points is the way we prevent concurrent Maintenance Tasks from happening concurrently. At that time we did not find better solution than providing an interface for locking,