subject:"Re\: Tasks run only on one machine"

RE: Tasks run only on one machine

2015-04-24 Thread Evo Eftimov

# of tasks = # of partitions, hence you can provide the desired number of partitions to the textFile API which should result a) in a better spatial distribution of the RDD b) each partition will be operated upon by a separate task You can provide the number of p -Original Message- Fro

Re: Tasks run only on one machine

2015-04-23 Thread Pat Ferrel

Argh, I looked and there really isn’t that much data yet. There will be thousands but starting small. I bet this is just a total data size not requiring all workers thing—sorry, nevermind. On Apr 23, 2015, at 10:30 AM, Pat Ferrel wrote: They are in HDFS so available on all workers On Apr 23

Re: Tasks run only on one machine

2015-04-23 Thread Pat Ferrel

They are in HDFS so available on all workers On Apr 23, 2015, at 10:29 AM, Pat Ferrel wrote: Physically? Not sure, they were written using the nano-batch rdds in a streaming job that is in a separate driver. The job is a Kafka consumer. Would that effect all derived rdds? If so is there somet

Re: Tasks run only on one machine

2015-04-23 Thread Pat Ferrel

Physically? Not sure, they were written using the nano-batch rdds in a streaming job that is in a separate driver. The job is a Kafka consumer. Would that effect all derived rdds? If so is there something I can do to mix it up or does Spark know best about execution speed here? On Apr 23, 201

Re: Tasks run only on one machine

2015-04-23 Thread Sean Owen

Where are the file splits? meaning is it possible they were also (only) available on one node and that was also your driver? On Thu, Apr 23, 2015 at 1:21 PM, Pat Ferrel wrote: > Sure > > var columns = mc.textFile(source).map { line => line.split(delimiter) } > > Here “source” is a comma delim

Re: Tasks run only on one machine

2015-04-23 Thread Pat Ferrel

Sure var columns = mc.textFile(source).map { line => line.split(delimiter) } Here “source” is a comma delimited list of files or directories. Both the textFile and .map tasks happen only on the machine they were launched from. Later other distributed operations happen but I suspect if I can

Re: Tasks run only on one machine

2015-04-23 Thread Jeetendra Gangele

Will you be able to paste code here? On 23 April 2015 at 22:21, Pat Ferrel wrote: > Using Spark streaming to create a large volume of small nano-batch input > files, ~4k per file, thousands of 'part-x' files. When reading the > nano-batch files and doing a distributed calculation my tasks r

RE: Tasks run only on one machine

Re: Tasks run only on one machine

Re: Tasks run only on one machine

Re: Tasks run only on one machine

Re: Tasks run only on one machine

Re: Tasks run only on one machine

Re: Tasks run only on one machine

7 matches

Site Navigation

Mail list logo

Footer information