Well, I have a fully functioning HDFS HA setup via a helm chart. My question is more about how to specify the hdfs nodename in such a way so that if a name node fails it communicates with the new active name node automatically. Swapnil mentioned configuring nameservice for hdfs namenode and I was looking for clarification on that. -Steve
On Mon, Dec 24, 2018 at 8:20 AM Andrey Zagrebin <and...@data-artisans.com> wrote: > Hi Steve, > > I think your question is specific to HDFS HA setup. > Flink HA addresses failover issues only for job manager and job meta state. > The storage layer for savepoints/checkpoints and its failover are > responsibility of HDFS deployment. > Flink uses HDFS as external system, available over location url. > I am not an expert on HDFS HA deployment. You could have a look into > hadoop docs [1]. > > Best, > Andrey > > [1] > https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html > > > On 21 Dec 2018, at 21:48, Steven Nelson <snel...@sourceallies.com> > wrote: > > > > First off, I am new to using HDFS to store things, so expect stupid > questions. > > > > I am working on hardening our Flink cluster for production usage. This > includes setting up an HA flink cluster, saving checkpoint and savepoints > to a central location etc. I have a functioning HDFS setup inside an HA > Kubernetes cluster. We have successfully stored checkpoint data in the HDFS > directory. > > > > When we specify the location for the HDFS savepoints/checkpoints/HA save > locations we specify the a single namenode in the url. My question is how > do we implement failover in the event that namenode fails? We looked at > putting the namenodes behind a load balancer, except the backup nodes > attempt to respond to writes (and fail). I figure I am missing something > simple. > > > > -Steve > >