Hello,

When using Flink+YARN (with HDFS) and having a long running Flink session
(mode) cluster with a Flink client submitting jobs, the HDFS could have a
replication factor greater than 1 (example 3).

So, I would like to know when and how any of the data (like event-data or
batch-data) or code (like JAR) in a Flink job is saved to the HDFS and is
replicated in the entire YARN cluster of nodes?

For example, in streaming applications, would all the event-data only be in
memory (RAM) until it reaches the DAG's sink and then must be saved into
HDFS?

Thank you,

Piper

Reply via email to