Hi, zero-downtime updates are currently not supported. What is supported in Flink right now is a savepoint-shutdown-restore cycle. With this, you first draw a savepoint (which is essentially a checkpoint with some meta data), then you cancel your job, then you do whatever you need to do (update machines, update Flink, update Job) and restore from the savepoint.
A possible solution for zero-downtime update would be to do a savepoint, then start a second Flink job from that savepoint, then shutdown the first job. With this, your data sinks would need to be able to handle being written to by 2 jobs at the same time, i.e. writes should probably be idempotent. This is the link to the savepoint doc: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/savepoints.html Does that help? Cheers, Aljoscha On Fri, 16 Dec 2016 at 18:16 Andrew Hoblitzell <ahoblitz...@salesforce.com> wrote: > Hi. Does Apache Flink currently have support for zero down time or the = > ability to do rolling upgrades? > > If so, what are concerns to watch for and what best practices might = > exist? Are there version management and data inconsistency issues to = > watch for?= >