I am trying to perform some processing and cache and count the RDD. Any solutions?
Seeing a weird error : File "/mnt/yarn/usercache/hadoop/appcache/application_1456909219314_0014/container_1456909219314_0014_01_000004/pyspark.zip/pyspark/serializers.py", line 550, in write_int stream.write(struct.pack("!i", value)) error: 'i' format requires -2147483648 <= number <= 2147483647 at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) Thanks and Regards, Suraj Sheth