Thanks for pointing that out.
On Wed, Apr 2, 2014 at 6:11 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > First, you shouldn't be using spark.incubator.apache.org anymore, just > spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in > the Python API at this point. > > > On Wed, Apr 2, 2014 at 3:00 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Is this a >> Scala-only<http://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#saveAsTextFile>feature? >> >> >> On Wed, Apr 2, 2014 at 5:55 PM, Patrick Wendell <pwend...@gmail.com>wrote: >> >>> For textFile I believe we overload it and let you set a codec directly: >>> >>> >>> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59 >>> >>> For saveAsSequenceFile yep, I think Mark is right, you need an option. >>> >>> >>> On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra >>> <m...@clearstorydata.com>wrote: >>> >>>> http://www.scala-lang.org/api/2.10.3/index.html#scala.Option >>>> >>>> The signature is 'def saveAsSequenceFile(path: String, codec: >>>> Option[Class[_ <: CompressionCodec]] = None)', but you are providing a >>>> Class, not an Option[Class]. >>>> >>>> Try counts.saveAsSequenceFile(output, >>>> Some(classOf[org.apache.hadoop.io.compress.SnappyCodec])) >>>> >>>> >>>> >>>> On Wed, Apr 2, 2014 at 12:18 PM, Kostiantyn Kudriavtsev < >>>> kudryavtsev.konstan...@gmail.com> wrote: >>>> >>>>> Hi there, >>>>> >>>>> >>>>> I've started using Spark recently and evaluating possible use cases in >>>>> our company. >>>>> >>>>> I'm trying to save RDD as compressed Sequence file. I'm able to save >>>>> non-compressed file be calling: >>>>> >>>>> >>>>> counts.saveAsSequenceFile(output) >>>>> >>>>> where counts is my RDD (IntWritable, Text). However, I didn't manage >>>>> to compress output. I tried several configurations and always got >>>>> exception: >>>>> >>>>> >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>> <console>:21: error: type mismatch; >>>>> found : >>>>> Class[org.apache.hadoop.io.compress.SnappyCodec](classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>> required: Option[Class[_ <: >>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.hadoop.io.compress.SnappyCodec]) >>>>> >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>> <console>:21: error: type mismatch; >>>>> found : >>>>> Class[org.apache.spark.io.SnappyCompressionCodec](classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>> required: Option[Class[_ <: >>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.spark.io.SnappyCompressionCodec]) >>>>> >>>>> and it doesn't work even for Gzip: >>>>> >>>>> >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>> <console>:21: error: type mismatch; >>>>> found : >>>>> Class[org.apache.hadoop.io.compress.GzipCodec](classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>> required: Option[Class[_ <: >>>>> org.apache.hadoop.io.compress.CompressionCodec]] >>>>> counts.saveAsSequenceFile(output, >>>>> classOf[org.apache.hadoop.io.compress.GzipCodec]) >>>>> >>>>> Could you please suggest solution? also, I didn't find how is it >>>>> possible to specify compression parameters (i.e. compression type for >>>>> Snappy). I wondered if you could share code snippets for writing/reading >>>>> RDD with compression? >>>>> >>>>> Thank you in advance, >>>>> Konstantin Kudryavtsev >>>>> >>>> >>>> >>> >> >