Re: Data locality and scheduler

2016-05-02 Thread Fabian Hueske
; specific job, acquired lazily, leveraging locality info gleaned from the > data sources/sinks. A hybrid solution could be used to counteract job > latency. > > -Eron > > > > From: fhue...@gmail.com > > Date: Thu, 28 Apr 2016 11:59:28 +0200 > > Subject:

RE: Data locality and scheduler

2016-04-30 Thread Eron Wright
; Date: Thu, 28 Apr 2016 11:59:28 +0200 > Subject: Re: Data locality and scheduler > To: dev@flink.apache.org > > Hi, > > yes, that can cause network traffic. > AFAIK, there are no plans to work on behavior. > > Best, Fabian > > 2016-04-26 18:17 GMT+02:00 CPC : >

Re: Data locality and scheduler

2016-04-28 Thread Fabian Hueske
Hi, yes, that can cause network traffic. AFAIK, there are no plans to work on behavior. Best, Fabian 2016-04-26 18:17 GMT+02:00 CPC : > Hi > > But isnt this behaviour can cause a lot of network activity? Is there any > roadmap or plan to change this behaviour? > On Apr 26, 2016 7:06 PM, "Fabian

Re: Data locality and scheduler

2016-04-26 Thread CPC
Hi But isnt this behaviour can cause a lot of network activity? Is there any roadmap or plan to change this behaviour? On Apr 26, 2016 7:06 PM, "Fabian Hueske" wrote: > Hi, > > Flink starts four tasks and then lazily assigns input splits to these tasks > with locality preference. So each task ma

Re: Data locality and scheduler

2016-04-26 Thread Fabian Hueske
Hi, Flink starts four tasks and then lazily assigns input splits to these tasks with locality preference. So each task may consume more than one split. This is different from Hadoop MapReduce or Spark which schedule a new task for each input split. In your case, the four tasks would be scheduled t

Data locality and scheduler

2016-04-26 Thread CPC
Hi, I look at some scheduler documentations but could not find answer to my question. My question is: suppose that i have a big file on 40 node hadoop cluster and since it is a big file every node has at least one chunk of the file. If i write a flink job and want to filter file and if job has par