Hi Zhu Zhu, Thanks for your detailed answer. Can you please help me to understand how flink task process the data locally on data nodes first? I want to understand how flink determines the processing to be done at the data nodes?
Regards, Pritam. On Sat, 19 Oct 2019 at 08:16, Zhu Zhu <reed...@gmail.com> wrote: > Hi Pratam, > > Flink does not deploy tasks to certain nodes according to source data > locations. > Instead, it will let a task process local input splits (data on the same > node) first. > So if your parallelism is large enough to distribute on all the data > nodes, most data can be processed locally. > > Thanks, > Zhu Zhu > > Pritam Sadhukhan <sadhukhan.pri...@gmail.com> 于2019年10月18日周五 上午10:59写道: > >> Hi, >> >> I am trying to process data stored on HDFS using flink batch jobs. >> Our data is splitted into 16 data nodes. >> >> I am curious to know how data will be pulled from the data nodes with the >> same number of parallelism set as the data split on HDFS i.e. 16. >> >> Is the flink task being executed locally on the data node server or it >> will happen in the flink nodes where data will be pulled remotely? >> >> Any help will be appreciated. >> >> Regards, >> Pritam. >> >