__
From: Chesnay Schepler
Sent: 13 March 2018 12:40:02
To: user@flink.apache.org
Subject: Re: HDFS data locality and distribution
Hello,
You said that "data is distributed very badly across slots"; do you mean that
only a small number of subtasks is reading from HDFS, or
Hello,
You said that "data is distributed very badly across slots"; do you mean
that only a small number of subtasks is reading from HDFS, or that the
keyed data is only processed by a few subtasks?
Flink does prioritize date locality over date distribution when reading
the files, but the fu
Relevant versions: Beam 2.1, Flink 1.3.
From: Reinier Kip
Sent: 12 March 2018 13:45:47
To: user@flink.apache.org
Subject: HDFS data locality and distribution
Hey all,
I'm trying to batch-process 30-ish files from HDFS, but I see that data is
distributed very