Hi again,
Ok, now I do not know of any way to fix the problem other then delete the
"bad" machine from the config + restart .. And you will need admin
privileges on cluster for that :(
However, before we give up on the speculative execution, I suspect that the
task is being run again and again on the same "faulty" machine because that
is where the data resides.
You could try to store / persist your RDD with MEMORY_ONLY_2 or
MEMORY_AND_DISK_2 as that will force the creation of a replica of the data
on another node. Thus, with two nodes, the scheduler may choose to execute
the speculative task on the second node (I'm not sure about his as I am just
not familiar enough with the Sparks scheduler priorities).
I'm not very hopeful but it may be worth a try (if you have the disk/RAM
space to be able to afford to duplicate all the data that is).
If not, I am afraid I am out of ideas ;)
Regards and good luck,
Gylfi.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-black-list-nodes-on-the-cluster-tp23650p23704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]