Re: A couple questions about shared variables

2014-09-20 Thread Matei Zaharia
Hey Sandy, On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote: Hey All,  A couple questions came up about shared variables recently, and I wanted to  confirm my understanding and update the doc to be a little more clear.  *Broadcast variables*  Now that tasks data i

Re: guava version conflicts

2014-09-20 Thread Marcelo Vanzin
Hmm, looks like the hack to maintain backwards compatibility in the Java API didn't work that well. I'll take a closer look at this when I get to work on Monday. On Fri, Sep 19, 2014 at 10:30 PM, Cody Koeninger wrote: > After the recent spark project changes to guava shading, I'm seeing issues >

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-09-20 Thread Seraph
I’m also one of the authors of this paper and I am responsible for the Spark experiments in this paper. Thank you for your guys discussion! (1) Ignacio Zendejas wrote > I should rephrase my question as it was poorly phrased: on average, how > much faster is Spark v. PySpark (I didn't really mean

A couple questions about shared variables

2014-09-20 Thread Sandy Ryza
Hey All, A couple questions came up about shared variables recently, and I wanted to confirm my understanding and update the doc to be a little more clear. *Broadcast variables* Now that tasks data is automatically broadcast, the only occasions where it makes sense to explicitly broadcast are: *

Re: Eliminate copy while sending data : any Akka experts here ?

2014-09-20 Thread Reynold Xin
BTW - a partial solution here: https://github.com/apache/spark/pull/2470 This doesn't address the 0 size block problem yet, but makes my large job on hundreds of terabytes of data much more reliable. On Fri, Jul 4, 2014 at 2:28 AM, Mridul Muralidharan wrote: > In our clusters, number of contai