To fix the problem, consider increasing number of partitions for your job.

Showing code snippet would help us understand your use case better.

Cheers

On Thu, Oct 8, 2015 at 1:39 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> See the comment of FramedSerializer() in serializers.py :
>
>     Serializer that writes objects as a stream of (length, data) pairs,
>     where C{length} is a 32-bit integer and data is C{length} bytes.
>
> Hence the limit on the size of object.
>
> On Thu, Oct 8, 2015 at 12:56 PM, XIANDI <zxd_ci...@hotmail.com> wrote:
>
>>   File "/home/hadoop/spark/python/pyspark/worker.py", line 101, in main
>>     process()
>>   File "/home/hadoop/spark/python/pyspark/worker.py", line 96, in process
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File "/home/hadoop/spark/python/pyspark/serializers.py", line 126, in
>> dump_stream
>>     self._write_with_length(obj, stream)
>>   File "/home/hadoop/spark/python/pyspark/serializers.py", line 140, in
>> _write_with_length
>>     raise ValueError("can not serialize object larger than 2G")
>> ValueError: can not serialize object larger than 2G
>>
>> Does anyone know how does this happen?
>>
>> Thanks!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/ValueError-can-not-serialize-object-larger-than-2G-tp24984.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to