Hi Steve,

I think your question is specific to HDFS HA setup.
Flink HA addresses failover issues only for job manager and job meta state.
The storage layer for savepoints/checkpoints and its failover are 
responsibility of HDFS deployment.
Flink uses HDFS as external system, available over location url.
I am not an expert on HDFS HA deployment. You could have a look into hadoop 
docs [1].

Best,
Andrey

[1] 
https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

> On 21 Dec 2018, at 21:48, Steven Nelson <snel...@sourceallies.com> wrote:
> 
> First off, I am new to using HDFS to store things, so expect stupid questions.
> 
> I am working on hardening our Flink cluster for production usage. This 
> includes setting up an HA flink cluster, saving checkpoint and savepoints to 
> a central location etc. I have a functioning HDFS setup inside an HA 
> Kubernetes cluster. We have successfully stored checkpoint data in the HDFS 
> directory.
> 
> When we specify the location for the HDFS savepoints/checkpoints/HA save 
> locations we specify the a single namenode in the url. My question is how do 
> we implement failover in the event that namenode fails? We looked at putting 
> the namenodes behind a load balancer, except the backup nodes attempt to 
> respond to writes (and fail). I figure I am missing something simple.
> 
> -Steve

Reply via email to