Re: Upgrading a job in rolling mode

Weihua Hu Wed, 29 Jun 2022 06:38:22 -0700

Hi, Robin

This is a good question, but Flink can't do rolling upgrades.

I'll try to explain the cost of Flink's support for RollingUpgrade.
1. There is a shuffle connection between tasks in a Region, and in order to
ensure the consistency of the data processed by the upstream and downstream
tasks in the same region, these tasks need to Failover/Restart at the same
time. So the cost of RollingUpgrade for jobs with Keyby/Hash shuffle is not
much different from the cost of stop/start jobs.
2. In order to maintain the accuracy of RollingUpgrade, many details need
to be considered, such as checkpoint coordinator, failover handling, and
operator coordinator.

But, IMO, Job RollingUpgrade is a worthwhile thing to do.

Best,
Weihua

On Wed, Jun 29, 2022 at 3:52 PM Robin Cassan <robin.cas...@contentsquare.com>
wrote:

> Hi all! We are running a flink cluster on kubernetes and deploying a
> single job on it through "flink run <jar>". Whenever we want to modify the
> jar, we cancel the job and run the "flink run" command again, with the new
> jar, and the retained checkpoint URL from the first run.
> This works well, but this adds some unavoidable downtime for each update:
> - Downtime in-between the "cancel" and the "run"
> - Downtime during restoration of the state
>
> We aim at reducing this downtime to a minimum and are wondering if there
> is a way to submit a new version of a job without completely stopping the
> old one, having each TM running the new jar one at a time and waiting for
> the state to be restored before moving on to the next TM? This would limit
> the downtime to only a single TM at a time
>
> Otherwise, we could try to submit a second job and let it restore before
> canceling the old one, but this raises complex synchronisation issues,
> especially since the second job will be restoring the first job's retained
> (incremental) checkpoint, which seems like a bad idea...
>
> What would be the best way to reduce downtime in this scenario, in your
> opinion?
>
> Thanks!
>

Re: Upgrading a job in rolling mode

Reply via email to