Hello everyone:

I'm running an experiment in a Spark cluster where some of the machines are
highly loaded with CPU, memory and network consuming process ( let's call
them straggler machines ). 

Obviously the tasks of these machines take longer to execute than in other
nodes of the cluster. However I've noticed that the tasks that fetch shuffle
data from these "straggler machines" are also delayed with long Read Shuffle
Data phases.

Is there anyway of knowing from which machines a task is reading its shuffle
data?. Something like node1 is reading its shuffle data from [node2,node3
and node4]?

Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Shuffle-Data-from-highly-loaded-nodes-tp26901.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to