Dear Developers,

I'm trying to investigate the communication pattern regarding data-flow
during execution of a Spark program defined by an RDD chain. I'm
investigating from the Task point of view, and found out that the task type
ResultTask (as retrieving the iterator for its RDD for a given partition),
effectively asks the BlockManager to get the block from local or remote
location. What I do there is to include actual location data in BlockResult
so the task can tell where it retrieved the data from. I've found out that
ResultTask can issue a data-flow only in this case.

What's the case with the ShuffleMapTask? What happens there? I'm trying to
log locations which are included in the shuffle process. I would be happy
to receive a few hints regarding where remote communication is managed in
case of ShuffleMapTask.

Thanks!

Zoltán

Reply via email to