My Arrays are in fact Array[Array[Long]] and like 17x150000 (17 centers with 150 000 modalities, i'm working on qualitative variables) so they are pretty large. I'm working on it to get them smaller, it's mostly a sparse matrix. Good things to know nervertheless.
Thanks, Julien Naour 2014-08-20 23:27 GMT+02:00 Patrick Wendell <[email protected]>: > For large objects, it will be more efficient to broadcast it. If your > array is small it won't really matter. How many centers do you have? Unless > you are finding that you have very large tasks (and Spark will print a > warning about this), it could be okay to just reference it directly. > > > On Wed, Aug 20, 2014 at 1:18 AM, Julien Naour <[email protected]> wrote: > >> Hi, >> >> I have a question about broadcast. I'm working on a clustering algorithm >> close to KMeans. It seems that KMeans broadcast clusters centers at each >> step. For the moment I just use my centers as Array that I call directly in >> my map at each step. Could it be more efficient to use broadcast instead of >> simple variable? >> >> Cheers, >> >> Julien Naour >> > >
