Hi,

Your assumptions are mostly correct.

1. This is correct, but you can also run a non-YARN setup where you only have 
one JobManager if you have a system that will make sure to restart/keep alive 
this JobManager. This could either be some supervisor, or Kubernetes, or Mesos. 
You also probably need to factor in the distributed filesystem (or similar 
thing) that you need for state snapshots.

2. You can run Flink without HA but then a failure will bring the complete 
cluster down, meaning any state checkpoints/snapshots will be lost. You can get 
around this by enabling externalised checkpoints [1]. With this, you can 
restore from a checkpoint even after the cluster failed.

3. In order to recover from failures you always need state snapshots. HA only 
makes the JobManager failure resilient. That being said, restarting the cluster 
after failure and recovering from an externalised checkpoint should probably 
take a couple of minutes if you don't have too many nodes.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#externalized-checkpoints>

Best,
Aljoscha

> On 17. Oct 2017, at 11:53, Srinath Perera <hemap...@gmail.com> wrote:
> 
> Hi All,
> 
> I am trying to write an article comparing minimal HA(Highly available)
> deployments of different streaming processing systems.
> 
> Basically, the question is if an organization has a limited workload, such
> as 10k events per second, which might grow in the future, what is the
> minimal setup they can use to run a highly available Stream Processor?
> 
> Could someone help answer following questions?
> 
>   1. How many nodes minimal Apache Flink HA setup needs? As I understood
>   from [2], it is zookeeper nodes + 2 job managers without YARN and 1 job
>   manager with YARN + worker nodes? Is this correct?
>   2. As per [1], Zookeeper needs minimal 3 nodes to provide HA. Is there a
>   way to run Apache Flink without HA?
>   3. If someone runs Apache Flink without HA, but use state snapshots, how
>   fast it can recover after a failure? ( ballpark figure)
> 
> Thanks
> Srinath
> 
> 
>   1.
>   
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html#Deploying_ZooKeeper
>   2.
>   
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#standalone-cluster-high-availability
> 
> 
> -- 
> ============================
> Srinath Perera, Ph.D.
>   http://people.apache.org/~hemapani/
>   http://srinathsview.blogspot.com/

Reply via email to