Hi, My cluster (standalone deployment) consisting of 3 worker nodes was in the middle of computations, when I added one more worker node. I can see that new worker is registered in master and that my job actually get one more executor. I have configured default parallelism as 12 and thus I see that each of three nodes holds 4 RDD blocks. I have expected that by adding one more node those partitions would be reassigned into 4 nodes instead of 3, but I don't see that. Even though all partitions are cached in memory of those 3 nodes and most probably have their locations or preferred locations known, I'd still expect partitions be reassigned. Are my expectations incorrect?
I do also have an HDFS setup on those the same 3 worker nodes, but when I added one more node to the spark cluster I haven't added one more HDFS node to a hadoop cluster. That in my mind shouldn't be the reason, but who knows? Thanks. -- Be well! Jean Morozov