Re: Are map tasks spilling data to disk?

Reynold Xin Sun, 15 Nov 2015 10:58:54 -0800

It depends on what the next operator is. If the next operator is just an
aggregation, then no, the hash join won't write anything to disk. It will
just stream the data through to the next operator. If the next operator is
shuffle (exchange), then yes.


On Sun, Nov 15, 2015 at 10:52 AM, gsvic <[email protected]> wrote:

> According to  this paper
> <
> http://www.cs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf
> >
> Spak's map tasks writes the results to disk.
>
> My actual question is, in  BroadcastHashJoin
> <
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L100
> >
> doExecute() method at line  109 the mapPartitions
> <
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala#L109
> >
> method is called. At this step, Spark will schedule a number of tasks for
> execution in order to perform the hash join operation. The results of these
> tasks will be written to each worker's disk?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Are-map-tasks-spilling-data-to-disk-tp15216.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Are map tasks spilling data to disk?

Reply via email to