First off, I am new to using HDFS to store things, so expect stupid
questions.

I am working on hardening our Flink cluster for production usage. This
includes setting up an HA flink cluster, saving checkpoint and savepoints
to a central location etc. I have a functioning HDFS setup inside an HA
Kubernetes cluster. We have successfully stored checkpoint data in the HDFS
directory.

When we specify the location for the HDFS savepoints/checkpoints/HA save
locations we specify the a single namenode in the url. My question is how
do we implement failover in the event that namenode fails? We looked at
putting the namenodes behind a load balancer, except the backup nodes
attempt to respond to writes (and fail). I figure I am missing something
simple.

-Steve

Reply via email to