Hey Sandy,
On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote:
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data i
Hmm, looks like the hack to maintain backwards compatibility in the
Java API didn't work that well. I'll take a closer look at this when I
get to work on Monday.
On Fri, Sep 19, 2014 at 10:30 PM, Cody Koeninger wrote:
> After the recent spark project changes to guava shading, I'm seeing issues
>
I’m also one of the authors of this paper and I am responsible for the Spark
experiments in this paper. Thank you for your guys discussion!
(1)
Ignacio Zendejas wrote
> I should rephrase my question as it was poorly phrased: on average, how
> much faster is Spark v. PySpark (I didn't really mean
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data is automatically broadcast, the only occasions where it
makes sense to explicitly broadcast are:
*
BTW - a partial solution here: https://github.com/apache/spark/pull/2470
This doesn't address the 0 size block problem yet, but makes my large job
on hundreds of terabytes of data much more reliable.
On Fri, Jul 4, 2014 at 2:28 AM, Mridul Muralidharan
wrote:
> In our clusters, number of contai