Re: Memory & compute-intensive tasks

Daniel Siegmann Mon, 14 Jul 2014 14:21:12 -0700

Depending on how your C++ program is designed, maybe you can feed the data
from multiple partitions into the same process? Getting the results back
might be tricky. But that may be the only way to guarantee you're only
using one invocation per node.



On Mon, Jul 14, 2014 at 5:12 PM, Matei Zaharia <[email protected]>
wrote:

> I think coalesce with shuffle=true will force it to have one task per
> node. Without that, it might be that due to data locality it decides to
> launch multiple ones on the same node even though the total # of tasks is
> equal to the # of nodes.
>
> If this is the *only* thing you run on the cluster, you could also
> configure the Workers to only report one core by manually launching the
> spark.deploy.worker.Worker process with that flag (see
> http://spark.apache.org/docs/latest/spark-standalone.html).
>
> Matei
>
> On Jul 14, 2014, at 1:59 PM, Daniel Siegmann <[email protected]>
> wrote:
>
> I don't have a solution for you (sorry), but do note that
> rdd.coalesce(numNodes) keeps data on the same nodes where it was. If you
> set shuffle=true then it should repartition and redistribute the data.
> But it uses the hash partitioner according to the ScalaDoc - I don't know
> of any way to supply a custom partitioner.
>
>
> On Mon, Jul 14, 2014 at 4:09 PM, Ravi Pandya <[email protected]> wrote:
>
>> I'm trying to run a job that includes an invocation of a memory &
>> compute-intensive multithreaded C++ program, and so I'd like to run one
>> task per physical node. Using rdd.coalesce(# nodes) seems to just allocate
>> one task per core, and so runs out of memory on the node. Is there any way
>> to give the scheduler a hint that the task uses lots of memory and cores so
>> it spreads it out more evenly?
>>
>> Thanks,
>>
>> Ravi Pandya
>> Microsoft Research
>>
>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
> E: [email protected] W: www.velos.io
>
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: [email protected] W: www.velos.io

Re: Memory & compute-intensive tasks

Reply via email to