See the comment of FramedSerializer() in serializers.py : Serializer that writes objects as a stream of (length, data) pairs, where C{length} is a 32-bit integer and data is C{length} bytes.
Hence the limit on the size of object. On Thu, Oct 8, 2015 at 12:56 PM, XIANDI <zxd_ci...@hotmail.com> wrote: > File "/home/hadoop/spark/python/pyspark/worker.py", line 101, in main > process() > File "/home/hadoop/spark/python/pyspark/worker.py", line 96, in process > serializer.dump_stream(func(split_index, iterator), outfile) > File "/home/hadoop/spark/python/pyspark/serializers.py", line 126, in > dump_stream > self._write_with_length(obj, stream) > File "/home/hadoop/spark/python/pyspark/serializers.py", line 140, in > _write_with_length > raise ValueError("can not serialize object larger than 2G") > ValueError: can not serialize object larger than 2G > > Does anyone know how does this happen? > > Thanks! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/ValueError-can-not-serialize-object-larger-than-2G-tp24984.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >