Hi Guillermo,

The current broadcast algorithm in Spark approximates the one described in
the Section 5 of this paper
<http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf>.
It is expected to scale sub-linearly; i.e., O(log N), where N is the number
of machines in your cluster.
We evaluated up to 100 machines, and it does follow O(log N) scaling.

Have you tried it on your 300-machine cluster? I'm curious to know what
happened.

-Mosharaf

On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz <konstt2...@gmail.com>
wrote:

> I'm looking for about how scale broadcast variables in Spark and what
> algorithm uses.
>
> I have found
> http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf
> I don't know if they're talking about the current version (1.2.1)
> because the file was created in 2010.
> I took a look to the documentation and API and I read that there is an
> TorrentFactory for broadcast variable
>  it's which it uses Spark right now? In the article they talk that
> Spark uses another one (Centralized HDFS Broadcast)
>
> How does it scale if I have a big cluster (about 300 nodes) the
> current algorithm?? is it linear? are there others options to choose
> others algorithms?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to