value.get(u.getLocationId();
>
> Object result = method(f,u1,u2,l);//method implementation not important,
> but requires all 3 objects
>
> return result;
>
> });
>
>
>
>
>
> *From:* Marcin Tustin [mailto:mtus...@handybook.com]
> *Sent:* 28 April
From: Marcin Tustin [mailto:mtus...@handybook.com]
Sent: 28 April 2016 12:27
To: Deligiannis, Ioannis (UK)
Cc: dev@spark.apache.org
Subject: Re: RDD.broadcast
I don't know what your notation really means. I'm very much unclear on why you
can't use the filter method for 1. If you'
gt;
>
>
>
> *From:* Marcin Tustin [mailto:mtus...@handybook.com
> ]
> *Sent:* 28 April 2016 12:08
> *To:* Deligiannis, Ioannis (UK)
> *Cc:* dev@spark.apache.org
>
> *Subject:* Re: RDD.broadcast
>
>
>
> Why would you ever need to do this? I'm genuinely cu
I second knowing the use case for interest. I can imagine a case where
knowledge of the RDD key distribution would help local computations, for
relaticely few keys, but would be interested to hear your motive.
Essentially, are you trying to achieve what would be an all-reduce type
operation in MPI
small (reference) RDD is quite common and much faster than using “join” method.
From: Marcin Tustin [mailto:mtus...@handybook.com]
Sent: 28 April 2016 12:08
To: Deligiannis, Ioannis (UK)
Cc: dev@spark.apache.org
Subject: Re: RDD.broadcast
Why would you ever need to do this? I'm genuinely curiou
Why would you ever need to do this? I'm genuinely curious. I view collects
as being solely for interactive work.
On Thursday, April 28, 2016, wrote:
> Hi,
>
>
>
> It is a common pattern to process an RDD, collect (typically a subset) to
> the driver and then broadcast back.
>
>
>
> Adding an RDD
Hi,
It is a common pattern to process an RDD, collect (typically a subset) to the
driver and then broadcast back.
Adding an RDD method that can do that using the torrent broadcast mechanics
would be much more efficient. In addition, it would not require the Driver to
also utilize its Heap hold