Reading Shuffle Data from highly loaded nodes

alvarobrandon Mon, 09 May 2016 02:59:15 -0700

Hello everyone:

I'm running an experiment in a Spark cluster where some of the machines are
highly loaded with CPU, memory and network consuming process ( let's call
them straggler machines ).


Obviously the tasks of these machines take longer to execute than in other
nodes of the cluster. However I've noticed that the tasks that fetch shuffle
data from these "straggler machines" are also delayed with long Read Shuffle
Data phases.

Is there anyway of knowing from which machines a task is reading its shuffle
data?. Something like node1 is reading its shuffle data from [node2,node3
and node4]?

Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-Shuffle-Data-from-highly-loaded-nodes-tp26901.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reading Shuffle Data from highly loaded nodes

Reply via email to