Has anyone figured this out yet!? I have gone looking for this exact
problem (spark 1.6.1) and I cannot get my partitions to be distributed
evenly across executors no matter what I've tried. it has been mentioned
several other times in the user group as well as the dev group (as
mentioned by Mike H
-spark-user-list.1001560.n3.nabble.com/spark-1-6-RDD-Partitions-not-distributed-evenly-to-executors-tp26911.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
Hello All (and Devs in particular),
Thank you again for your further responses. Please find a detailed
email below which identifies the cause (I believe) of the partition
imbalance problem, which occurs in spark 1.5, 1.6, and a 2.0-SNAPSHOT.
This is followed by follow-up questions for the dev comm
I have a similar experience.
Using 32 machines, I can see than number of tasks (partitions) assigned to
executors (machines) is not even. Moreover, the distribution change every
stage (iteration).
I wonder why Spark needs to move partitions around any way, should not the
scheduler reduce network
can you try:
spark.shuffle.reduceLocality.enabled=false
On Mon, Apr 4, 2016 at 8:17 PM, Mike Hynes <91m...@gmail.com> wrote:
> Dear all,
>
> Thank you for your responses.
>
> Michael Slavitch:
> > Just to be sure: Has spark-env.sh and spark-defaults.conf been
> correctly propagated to all nodes?
Dear all,
Thank you for your responses.
Michael Slavitch:
> Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
> propagated to all nodes? Are they identical?
Yes; these files are stored on a shared memory directory accessible to
all nodes.
Koert Kuipers:
> we ran into si
bq. the modifications do not touch the scheduler
If the changes can be ported over to 1.6.1, do you mind reproducing the
issue there ?
I ask because master branch changes very fast. It would be good to narrow
the scope where the behavior you observed started showing.
On Mon, Apr 4, 2016 at 6:12
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
propagated to all nodes? Are they identical?
> On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
>
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resolution;
[ CC'ing dev list since nearly identical questions have occurred in
user list recently w/o resolution;
c.f.:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html
http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-sing