Re: Data processing with HDFS local or remote

Pritam Sadhukhan Sun, 20 Oct 2019 19:18:01 -0700

Hi Zhu Zhu,

Thanks for your detailed answer.
Can you please help me to understand how flink task process the data
locally on data nodes first?
I want to understand how flink determines the processing to be done at the
data nodes?


Regards,
Pritam.

On Sat, 19 Oct 2019 at 08:16, Zhu Zhu <reed...@gmail.com> wrote:

> Hi Pratam,
>
> Flink does not deploy tasks to certain nodes according to source data
> locations.
> Instead, it will let a task process local input splits (data on the same
> node) first.
> So if your parallelism is large enough to distribute on all the data
> nodes, most data can be processed locally.
>
> Thanks,
> Zhu Zhu
>
> Pritam Sadhukhan <sadhukhan.pri...@gmail.com> 于2019年10月18日周五 上午10:59写道：
>
>> Hi,
>>
>> I am trying to process data stored on HDFS using flink batch jobs.
>> Our data is splitted into 16 data nodes.
>>
>> I am curious to know how data will be pulled from the data nodes with the
>> same number of parallelism set as the data split on HDFS i.e. 16.
>>
>> Is the flink task being executed locally on the data node server or it
>> will happen in the flink nodes where data will be pulled remotely?
>>
>> Any help will be appreciated.
>>
>> Regards,
>> Pritam.
>>
>

Re: Data processing with HDFS local or remote

Reply via email to