Hello All (and Devs in particular),
Thank you again for your further responses. Please find a detailed
email below which identifies the cause (I believe) of the partition
imbalance problem, which occurs in spark 1.5, 1.6, and a 2.0-SNAPSHOT.
This is followed by follow-up questions for the dev comm
I have a similar experience.
Using 32 machines, I can see than number of tasks (partitions) assigned to
executors (machines) is not even. Moreover, the distribution change every
stage (iteration).
I wonder why Spark needs to move partitions around any way, should not the
scheduler reduce network
can you try:
spark.shuffle.reduceLocality.enabled=false
On Mon, Apr 4, 2016 at 8:17 PM, Mike Hynes <91m...@gmail.com> wrote:
> Dear all,
>
> Thank you for your responses.
>
> Michael Slavitch:
> > Just to be sure: Has spark-env.sh and spark-defaults.conf been
> correctly propagated to all nodes?
Dear all,
Thank you for your responses.
Michael Slavitch:
> Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
> propagated to all nodes? Are they identical?
Yes; these files are stored on a shared memory directory accessible to
all nodes.
Koert Kuipers:
> we ran into si
we ran into similar issues and it seems related to the new memory
management. can you try:
spark.memory.useLegacyMode = true
On Mon, Apr 4, 2016 at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resoluti
bq. the modifications do not touch the scheduler
If the changes can be ported over to 1.6.1, do you mind reproducing the
issue there ?
I ask because master branch changes very fast. It would be good to narrow
the scope where the behavior you observed started showing.
On Mon, Apr 4, 2016 at 6:12
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly
propagated to all nodes? Are they identical?
> On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote:
>
> [ CC'ing dev list since nearly identical questions have occurred in
> user list recently w/o resolution;
[ CC'ing dev list since nearly identical questions have occurred in
user list recently w/o resolution;
c.f.:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html
http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-sing