Re: Spark remote communication pattern

2015-04-09 Thread Reynold Xin
For torrent broadcast, data are read directly through the block manager: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L167 On Thu, Apr 9, 2015 at 7:27 AM, Zoltán Zvara wrote: > Thanks! I've found the fetcher! Is there any ot

Re: Spark remote communication pattern

2015-04-09 Thread Zoltán Zvara
Thanks! I've found the fetcher! Is there any other places and cases where blocks are traveled through network? Zvara Zoltán mail, hangout, skype: zoltan.zv...@gmail.com mobile, viber: +36203129543 bank: 10918001-0021-50480008 address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a elte: HSKSJZ

Re: Spark remote communication pattern

2015-04-09 Thread Reynold Xin
Take a look at the following two files: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala and https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala On

Spark remote communication pattern

2015-04-09 Thread Zoltán Zvara
Dear Developers, I'm trying to investigate the communication pattern regarding data-flow during execution of a Spark program defined by an RDD chain. I'm investigating from the Task point of view, and found out that the task type ResultTask (as retrieving the iterator for its RDD for a given parti