Hi,
@Emmanuel:
"Is the Flink behavior mentioned native or is this something happening when
running Flink on YARN?"
The input split assignment behavior Stephan described is implemented into
Flink, so it works in a stanalone Flink cluster and in a YARN setup.
In a setup where each machine running a
Hi Stephan,
The case is this: I have lots of images stored on a cluster, and I want to
create a system in which I send a message (to a message queue: let's say
Apache Kafka) and the message is accepted within the cluster and processed.
The message contains the ID of one of the images (or even its
aining data locality with list of paths (strings) as input
From: se...@apache.org
To: user@flink.apache.org
Hi Guy,
This sounds like a use case that should workwith Flink.
When it comes to input handling, Flink differs a bit from Spark. Flink creates
a set of input tasks and a set of input splits.
Hi Guy,
This sounds like a use case that should workwith Flink.
When it comes to input handling, Flink differs a bit from Spark. Flink
creates a set of input tasks and a set of input splits. The splits are then
on-the-fly assigned to the tasks. Each task may work on multiple input
spits, which ar
Hi guy,
I don't have an answer about flink but a couple comments on your use case I
hope might help:
- you should view HDFS as a giant RAID across nodes: the namenode maintains the
file table but the data is distributed and replicated across nodes by bloc.
There is no 'data locality' guarantee