Hi there,
I've started using Spark recently and evaluating possible use cases in our company. I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling: counts.saveAsSequenceFile(output) where counts is my RDD (IntWritable, Text). However, I didn't manage to compress output. I tried several configurations and always got exception: counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec]) <console>:21: error: type mismatch; found : Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec]) required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.SnappyCodec]) counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec]) <console>:21: error: type mismatch; found : Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec]) required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] counts.saveAsSequenceFile(output, classOf[org.apache.spark.io.SnappyCompressionCodec]) and it doesn't work even for Gzip: counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec]) <console>:21: error: type mismatch; found : Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec]) required: Option[Class[_ <: org.apache.hadoop.io.compress.CompressionCodec]] counts.saveAsSequenceFile(output, classOf[org.apache.hadoop.io.compress.GzipCodec]) Could you please suggest solution? also, I didn't find how is it possible to specify compression parameters (i.e. compression type for Snappy). I wondered if you could share code snippets for writing/reading RDD with compression? Thank you in advance, Konstantin Kudryavtsev