Hi, I am trying to save an RDD to an S3 bucket using RDD.saveAsSequenceFile(self, path, CompressionCodec) function in python. I need to save the RDD in GZIP. Can anyone tell me how to send the gzip codec class as a parameter into the function.
I tried *RDD.saveAsSequenceFile('{0}{1}'.format(outputFolder,datePath),compressionCodecClass=gzip.GzipFile)* but it hits me with a : *AttributeError: type object 'GzipFile' has no attribute '_get_object_id' * Which I think is because JVM cant find the scala mapping gzip. *If you can guide me about any method to write the RDD as a gzip(.gz) into disc that is very much appreciated. * Many thanks SahanB -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-a-compression-codec-in-saveAsSequenceFile-in-Pyspark-Python-API-tp18899.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org