pyspark serializer can't handle functions?

madeleine Sun, 15 Jun 2014 16:50:22 -0700

It seems that the default serializer used by pyspark can't serialize a list
of functions.
I've seen some posts about trying to fix this by using dill to serialize
rather than pickle. 
Does anyone know what the status of that project is, or whether there's
another easy workaround?


I've pasted a sample error message below. Here, regs is a function defined
in another file myfile.py that has been included on all workers via the
pyFiles argument to SparkContext: sc = SparkContext("local",
"myapp",pyFiles=["myfile.py"]).

  File "runfile.py", line 45, in __init__
    regsRDD = sc.parallelize([regs]*self.n)
  File "/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/context.py",
line 223, in parallelize
    serializer.dump_stream(c, tempFile)
  File
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
182, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
118, in dump_stream
    self._write_with_length(obj, stream)
  File
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
128, in _write_with_length
    serialized = self.dumps(obj)
  File
"/Applications/spark-0.9.1-bin-hadoop2/python/pyspark/serializers.py", line
270, in dumps
    def dumps(self, obj): return cPickle.dumps(obj, 2)
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup
__builtin__.function failed



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-serializer-can-t-handle-functions-tp7650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

pyspark serializer can't handle functions?

Reply via email to