It depends on what the next operator is. If the next operator is just an
aggregation, then no, the hash join won't write anything to disk. It will
just stream the data through to the next operator. If the next operator is
shuffle (exchange), then yes.

On Sun, Nov 15, 2015 at 10:52 AM, gsvic <victora...@gmail.com> wrote:

> According to  this paper
> <
> http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf
> >
> Spak's map tasks writes the results to disk.
>
> My actual question is, in  BroadcastHashJoin
> <
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L100
> >
> doExecute() method at line  109 the mapPartitions
> <
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L109
> >
> method is called. At this step, Spark will schedule a number of tasks for
> execution in order to perform the hash join operation. The results of these
> tasks will be written to each worker's disk?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Are-map-tasks-spilling-data-to-disk-tp15216.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to