Flink+YARN HDFS replication factor

Piper Piper Tue, 28 Jan 2020 22:07:23 -0800

Hello,

When using Flink+YARN (with HDFS) and having a long running Flink session
(mode) cluster with a Flink client submitting jobs, the HDFS could have a
replication factor greater than 1 (example 3).


So, I would like to know when and how any of the data (like event-data or
batch-data) or code (like JAR) in a Flink job is saved to the HDFS and is
replicated in the entire YARN cluster of nodes?

For example, in streaming applications, would all the event-data only be in
memory (RAM) until it reaches the DAG's sink and then must be saved into
HDFS?

Thank you,

Piper

Flink+YARN HDFS replication factor

Reply via email to