The same applies to Flink. Transient data will only be stored on local
disks.
Cheers,
Till
On Thu, Jan 30, 2020 at 9:10 PM Piper Piper wrote:
> Please disregard my previous email. I found the answer online.
>
> I thought writing data to local disk automatically meant the data would be
> persist
Please disregard my previous email. I found the answer online.
I thought writing data to local disk automatically meant the data would be
persisted to HDFS. However, Spark writes data (in between shuffles) to
local disk only.
Thanks
On Thu, Jan 30, 2020, 2:00 PM Piper Piper wrote:
> Hi Till,
>
Hi Till,
Thank you for the information!
In case of wide transformations, Spark stores input data onto disk between
shuffles. So, I was wondering if Flink does that as well (even for windows
of streaming data), and whether that "storing to disk" is persisted to the
HDFS and honors the replication
Hi Piper,
in general, Flink does not store transient data such as event data on HDFS.
Event data (data which is sent between the TaskManager's to process it) is
only kept in memory and if becoming too big spilled by some operators to
local disk.
What Flink stores on HDFS (given it is configured t
Hello,
When using Flink+YARN (with HDFS) and having a long running Flink session
(mode) cluster with a Flink client submitting jobs, the HDFS could have a
replication factor greater than 1 (example 3).
So, I would like to know when and how any of the data (like event-data or
batch-data) or code (