Hi!

I think in many cases it is more convenient to have a savepoint-and-stop
operation to use for upgrading the cluster/job but it should not be
required. If the output of your job needs to be exactly once and you don't
have an external deduplication mechanism than even the current
fault-tolerance mechanism is not good enough to serve you under normal
operations.

Cheers,
Gyula


Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: 2016. dec. 23.,
P, 19:54):

> Hi Greg,
> yes certainly, there are more requirements to this than the quick sketch I
> gave above and that seems to be one of them.
>
> Cheers,
> Aljoscha
>
> On Thu, 22 Dec 2016 at 17:54 Greg Hogan <c...@greghogan.com> wrote:
>
> Aljoscha,
>
> For the second, possible solution is there also a requirement that the
> data sinks handle out-of-order writes? If the new job outpaces the old job
> which is then terminated, the final write from the old job could have
> overwritten "newer" writes from the new job.
>
> Greg
>
> On Tue, Dec 20, 2016 at 12:27 PM, Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
> Hi,
> zero-downtime updates are currently not supported. What is supported in
> Flink right now is a savepoint-shutdown-restore cycle. With this, you first
> draw a savepoint (which is essentially a checkpoint with some meta data),
> then you cancel your job, then you do whatever you need to do (update
> machines, update Flink, update Job) and restore from the savepoint.
>
> A possible solution for zero-downtime update would be to do a savepoint,
> then start a second Flink job from that savepoint, then shutdown the first
> job. With this, your data sinks would need to be able to handle being
> written to by 2 jobs at the same time, i.e. writes should probably be
> idempotent.
>
> This is the link to the savepoint doc:
> https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/savepoints.html
>
> Does that help?
>
> Cheers,
> Aljoscha
>
> On Fri, 16 Dec 2016 at 18:16 Andrew Hoblitzell <ahoblitz...@salesforce.com>
> wrote:
>
> Hi. Does Apache Flink currently have support for zero down time or the =
> ability to do rolling upgrades?
>
> If so, what are concerns to watch for and what best practices might =
> exist? Are there version management and data inconsistency issues to =
> watch for?=
>
>
>

Reply via email to