Hi,

using pyspark 1.1.0 on YARN 2.5.0. all operations run nicely in parallel - I 
can seen multiple python processes spawned on each nodemanager but from some 
reason when running cartesian there is only single python process running on 
each node. the task is indicating thousands of partitions so don't understand 
why it is not running with higher parallelism. the performance is obviously 
poor although other operation rocks.

any idea how to improve this?

thank you,
Antony.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to