Dear Developers, I'm trying to investigate the communication pattern regarding data-flow during execution of a Spark program defined by an RDD chain. I'm investigating from the Task point of view, and found out that the task type ResultTask (as retrieving the iterator for its RDD for a given partition), effectively asks the BlockManager to get the block from local or remote location. What I do there is to include actual location data in BlockResult so the task can tell where it retrieved the data from. I've found out that ResultTask can issue a data-flow only in this case.
What's the case with the ShuffleMapTask? What happens there? I'm trying to log locations which are included in the shuffle process. I would be happy to receive a few hints regarding where remote communication is managed in case of ShuffleMapTask. Thanks! Zoltán