t; *From:* Scott Kidder
> *Sent:* יום ו 13 יולי 2018 01:13
> *To:* Sofer, Tovi [ICG-IT]
> *Cc:* user@flink.apache.org
> *Subject:* Re: high availability with automated disaster recovery using
> zookeeper
>
>
>
> I've used a multi-datacenter Consul cluster used to
Thank you Scott,
Looks like a very elegant solution.
How did you manage high availability in single data center?
Thanks,
Tovi
From: Scott Kidder
Sent: יום ו 13 יולי 2018 01:13
To: Sofer, Tovi [ICG-IT]
Cc: user@flink.apache.org
Subject: Re: high availability with automated disaster recovery
I've used a multi-datacenter Consul cluster used to coordinate
service-discovery. When a service starts up in the primary DC, it registers
itself in Consul with a key that has a TTL that must be periodically
renewed. If the service shuts down or terminates abruptly, the key expires
and is removed f
accurate, since it seems to contradict the image in link
> below
>
> https://mesosphere.com/blog/apache-flink-on-dcos-and-apache-mesos ]
>
>
>
> *From:* Sofer, Tovi [ICG-IT]
> *Sent:* יום ג 10 יולי 2018 20:04
> *To:* 'Till Rohrmann' ; user
> *Cc:* Gardi, Hila [
; user
Cc: Gardi, Hila [ICG-IT]
Subject: RE: high availability with automated disaster recovery using zookeeper
Hi Till, group,
Thank you for your response.
After reading further online on Mesos – Can’t Mesos fill the requirement of
running job manager in primary server?
By using: “c
-disaster-recovery/
)
Is this supported by Flink cluster on Mesos ?
Thanks again
Tovi
From: Till Rohrmann
Sent: יום ג 10 יולי 2018 10:11
To: Sofer, Tovi [ICG-IT]
Cc: user
Subject: Re: high availability with automated disaster recovery using zookeeper
Hi Tovi,
that is an interesting use case
Hi Tovi,
that is an interesting use case you are describing here. I think, however,
it depends mainly on the capabilities of ZooKeeper to produce the intended
behavior. Flink itself relies on ZooKeeper for leader election in HA mode
but does not expose any means to influence the leader election pr
Hi all,
We are now examining how to achieve high availability for Flink, and to support
also automatic recovery in disaster scenario- when all DC goes down.
We have DC1 which we usually want work to be done, and DC2 - which is more
remote and we want work to go there only when DC1 is down.
We e