Re: Broadcast vs simple variable

Julien Naour Thu, 21 Aug 2014 00:41:39 -0700

My Arrays are in fact Array[Array[Long]] and like 17x150000 (17 centers
with 150 000 modalities, i'm working on qualitative variables) so they are
pretty large. I'm working on it to get them smaller, it's mostly a sparse
matrix.
Good things to know nervertheless.


Thanks,

Julien Naour


2014-08-20 23:27 GMT+02:00 Patrick Wendell <[email protected]>:

> For large objects, it will be more efficient to broadcast it. If your
> array is small it won't really matter. How many centers do you have? Unless
> you are finding that you have very large tasks (and Spark will print a
> warning about this), it could be okay to just reference it directly.
>
>
> On Wed, Aug 20, 2014 at 1:18 AM, Julien Naour <[email protected]> wrote:
>
>> Hi,
>>
>> I have a question about broadcast. I'm working on a clustering algorithm
>> close to KMeans. It seems that KMeans broadcast clusters centers at each
>> step. For the moment I just use my centers as Array that I call directly in
>> my map at each step. Could it be more efficient to use broadcast instead of
>> simple variable?
>>
>> Cheers,
>>
>> Julien Naour
>>
>
>

Re: Broadcast vs simple variable

Reply via email to