According to this paper <http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf> Spak's map tasks writes the results to disk.
My actual question is, in BroadcastHashJoin <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L100> doExecute() method at line 109 the mapPartitions <https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L109> method is called. At this step, Spark will schedule a number of tasks for execution in order to perform the hash join operation. The results of these tasks will be written to each worker's disk? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Map-Tasks-Disk-Spill-tp15217.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org