When the data size is huge, you better of use the torrentBroadcastFactory. Thanks Best Regards
On Sun, Sep 14, 2014 at 2:54 PM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Specifically the error I see when I try to operate on rdd created by > sc.parallelize method > : org.apache.spark.SparkException: Job aborted due to stage failure: > Serialized task 12:12 was 12062263 bytes which exceeds spark.akka.frameSize > (10485760 bytes). Consider using broadcast variables for large values. > > On Sun, Sep 14, 2014 at 2:20 AM, Chengi Liu <chengi.liu...@gmail.com> > wrote: > >> Hi, >> I am trying to create an rdd out of large matrix.... sc.parallelize >> suggest to use broadcast >> But when I do >> >> sc.broadcast(data) >> I get this error: >> >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 370, >> in broadcast >> pickled = pickleSer.dumps(value) >> File "/usr/common/usg/spark/1.0.2/python/pyspark/serializers.py", line >> 279, in dumps >> def dumps(self, obj): return cPickle.dumps(obj, 2) >> SystemError: error return without exception set >> Help? >> >> >