Hi Guillermo, The current broadcast algorithm in Spark approximates the one described in the Section 5 of this paper <http://www.mosharaf.com/wp-content/uploads/orchestra-sigcomm11.pdf>. It is expected to scale sub-linearly; i.e., O(log N), where N is the number of machines in your cluster. We evaluated up to 100 machines, and it does follow O(log N) scaling.
Have you tried it on your 300-machine cluster? I'm curious to know what happened. -Mosharaf On Mon, Feb 23, 2015 at 8:06 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > I'm looking for about how scale broadcast variables in Spark and what > algorithm uses. > > I have found > http://www.cs.berkeley.edu/~agearh/cs267.sp10/files/mosharaf-spark-bc-report-spring10.pdf > I don't know if they're talking about the current version (1.2.1) > because the file was created in 2010. > I took a look to the documentation and API and I read that there is an > TorrentFactory for broadcast variable > it's which it uses Spark right now? In the article they talk that > Spark uses another one (Centralized HDFS Broadcast) > > How does it scale if I have a big cluster (about 300 nodes) the > current algorithm?? is it linear? are there others options to choose > others algorithms? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >