Hi Pratam, Flink does not deploy tasks to certain nodes according to source data locations. Instead, it will let a task process local input splits (data on the same node) first. So if your parallelism is large enough to distribute on all the data nodes, most data can be processed locally.
Thanks, Zhu Zhu Pritam Sadhukhan <sadhukhan.pri...@gmail.com> 于2019年10月18日周五 上午10:59写道: > Hi, > > I am trying to process data stored on HDFS using flink batch jobs. > Our data is splitted into 16 data nodes. > > I am curious to know how data will be pulled from the data nodes with the > same number of parallelism set as the data split on HDFS i.e. 16. > > Is the flink task being executed locally on the data node server or it > will happen in the flink nodes where data will be pulled remotely? > > Any help will be appreciated. > > Regards, > Pritam. >