One more thing I had to do. In HA set up, the TMs are not able to resolve the job manager's random port through the jobmanager.rpc.port <https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#jobmanager-rpc-port> setting. Setup high-availability.jobmanager.port as a predefined port in your flink-conf.yaml and expose the port via job-manager-deployment and job-manager-service resources as well.
On Mon, Feb 11, 2019 at 4:09 AM Konstantin Knauf <konstan...@ververica.com> wrote: > Hi Steve, > > HDFS can be used as checkpoint storage and plays a crucial role in Flink's > fault tolerance mechanism. HDFS alone will not suffice to get a Flink HA > setup, though. You also need Zookeeper for JobManager HA. > > Flink configuration: > > high-availability: zookeeper > high-availability.zookeeper.quorum: <host>:<port> > high-availability.cluster-id: /cluster_one # important: customize per cluster > high-availability.storageDir: hdfs:///flink/recovery > > > Regarding the steps to setup checkpointing to HDFS. You don't need to set > the "fs.default-scheme". As "state.backend" you use either "rocksdb" or > "filesystem". They both use HFDS for checkpoints. You can also set > "env.hadoop.conf.dir" in the Flink configuration as opposed to the > environment variable. The environment variable will then be set by the > Flink scripts. > > Of course, in a basic Kubernetes setup TaskManagers and JobManager pods > should both be controlled by a Deployment, so that they are brought up > again after a fault. > > Hope this helps, and please let me know if run into any issues. > > Best, > > Konstantin > > > On Wed, Feb 6, 2019 at 7:47 PM Steven Nelson <snel...@sourceallies.com> > wrote: > >> I am working on a POC High Availability installation of Flink on top of >> Kubernetes with HDFS as a data storage location. I am not finding much >> documentation on doing this, or I am finding the documentation in parts and >> maybe getting it put together correctly. I think it falls between being an >> HDFS thing and a Flink thing. >> >> I am deploying to Kubernetes using the flink:1.7.0-hadoop27-scala_2.11 >> container off of docker hub. >> >> I think these are the things I need to do >> 1) Setup an hdfs-site.xml file per >> https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Deployment >> 2) Set the HADOOP_CONF_DIR environment variable to the location of that >> file per >> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#hdfs >> 3) Create a flink-conf.yaml file that looks something like >> fs.default-scheme: hdfs:// >> state.backend: rocksdb >> state.savepoints.dir: hdfs://flink/savepoints >> state.checkpoints.dir: hdfs://flink/checkpoints >> 4) Dance a little jig when it works. >> >> Has anyone set this up? If so, am I missing anything? >> >> -Steve >> >> >> > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen >