Re: Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

2024-03-26 Thread Yang Wang
Usually, you should use the HDFS nameservice instead of the NameNode hostname:port to avoid NN failover. And you could find the supported nameservice in the hdfs-site.xml in the key *dfs.nameservices*. Best, Yang On Fri, Mar 22, 2024 at 8:33 PM Sachin Mittal wrote: > So, when we create an EMR

Re: Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

2024-03-22 Thread Sachin Mittal
So, when we create an EMR cluster the NN service runs on the primary node of the cluster. Now at the time of creating the cluster, how can we specify the name of this NN in format hdfs://*namenode-host*:8020/. Is there a standard name by which we can identify the NN server ? Thanks Sachin On Fr

Re: Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

2024-03-21 Thread Asimansu Bera
Hello Sachin, Typically, Cloud VMs are ephemeral, meaning that if the EMR cluster goes down or VMs are required to be shut down for security updates or due to faults, new VMs will be added to the cluster. As a result, any data stored in the local file system, such as file://tmp, would be lost. To

Understanding RocksDBStateBackend in Flink on Yarn on AWS EMR

2024-03-21 Thread Sachin Mittal
Hi, We are using AWS EMR where we can submit our flink jobs to a long running flink cluster on Yarn. We wanted to configure RocksDBStateBackend as our state backend to store our checkpoints. So we have configured following properties in our flink-conf.yaml - state.backend.type: rocksdb - s