subject:"Re\: data localisation in spark"

Re: data localisation in spark

2015-06-03 Thread Sandy Ryza

Tasks are scheduled on executors based on data locality. Things work as you would expect in the example you brought up. Through dynamic allocation, the number of executors can change throughout the life time of an application. 10 executors (or 5 executors with 2 cores each) are not needed for a

Re: data localisation in spark

2015-06-02 Thread Shushant Arora

So in spark is after acquiring executors from ClusterManeger, does tasks are scheduled on executors based on datalocality ?I Mean if in an application there are 2 jobs and output of 1 job is used as input of another job. And in job1 I did persist on some RDD, then while running job2 will it use th

Re: data localisation in spark

2015-06-02 Thread Sandy Ryza

It is not possible with JavaSparkContext either. The API mentioned below currently does not have any effect (we should document this). The primary difference between MR and Spark here is that MR runs each task in its own YARN container, while Spark runs multiple tasks within an executor, which ne

Re: data localisation in spark

2015-06-02 Thread Shushant Arora

Is it possible in JavaSparkContext ? JavaSparkContext jsc = new JavaSparkContext(conf); JavaRDDlines = jsc.textFile(args[0]); If yes , does its programmer's responsibilty to first calculate splits locations and then instantiate spark context with preferred locations? How does its achieved in MR2

Re: data localisation in spark

2015-05-31 Thread Sandy Ryza

Hi Shushant, Spark currently makes no effort to request executors based on data locality (although it does try to schedule tasks within executors based on data locality). We're working on adding this capability at SPARK-4352 . -Sandy On Sun, May

Re: data localisation in spark

Re: data localisation in spark

Re: data localisation in spark

Re: data localisation in spark

Re: data localisation in spark

5 matches

Site Navigation

Mail list logo

Footer information