Re: Spark output compression on HDFS

2014-04-04 Thread Azuryy
There is no compress type for snappy. Sent from my iPhone5s > On 2014年4月4日, at 23:06, Konstantin Kudryavtsev > wrote: > > Can anybody suggest how to change compression level (Record, Block) for > Snappy? > if it possible, of course > > thank you in advance > > Thank you, > Konstantin Kudr

Re: Spark output compression on HDFS

2014-04-04 Thread Konstantin Kudryavtsev
Can anybody suggest how to change compression level (Record, Block) for Snappy? if it possible, of course thank you in advance Thank you, Konstantin Kudryavtsev On Thu, Apr 3, 2014 at 10:28 PM, Konstantin Kudryavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Thanks all, it works fine now an

Re: Spark output compression on HDFS

2014-04-03 Thread Konstantin Kudryavtsev
Thanks all, it works fine now and I managed to compress output. However, I am still in stuck... How is it possible to set compression type for Snappy? I mean to set up record or block level of compression for output On Apr 3, 2014 1:15 AM, "Nicholas Chammas" wrote: > Thanks for pointing that out.

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
Thanks for pointing that out. On Wed, Apr 2, 2014 at 6:11 PM, Mark Hamstra wrote: > First, you shouldn't be using spark.incubator.apache.org anymore, just > spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in > the Python API at this point. > > > On Wed, Apr 2, 2014 at 3:00

Re: Spark output compression on HDFS

2014-04-02 Thread Mark Hamstra
First, you shouldn't be using spark.incubator.apache.org anymore, just spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in the Python API at this point. On Wed, Apr 2, 2014 at 3:00 PM, Nicholas Chammas wrote: > Is this a > Scala-only

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
Is this a Scala-onlyfeature? On Wed, Apr 2, 2014 at 5:55 PM, Patrick Wendell wrote: > For textFile I believe we overload it and let you set a codec directly: > > > https://github.com/apache/spa

Re: Spark output compression on HDFS

2014-04-02 Thread Patrick Wendell
For textFile I believe we overload it and let you set a codec directly: https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59 For saveAsSequenceFile yep, I think Mark is right, you need an option. On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra wrote

Re: Spark output compression on HDFS

2014-04-02 Thread Mark Hamstra
http://www.scala-lang.org/api/2.10.3/index.html#scala.Option The signature is 'def saveAsSequenceFile(path: String, codec: Option[Class[_ <: CompressionCodec]] = None)', but you are providing a Class, not an Option[Class]. Try counts.saveAsSequenceFile(output, Some(classOf[org.apache.hadoop.io.co

Re: Spark output compression on HDFS

2014-04-02 Thread Nicholas Chammas
I'm also interested in this. On Wed, Apr 2, 2014 at 3:18 PM, Kostiantyn Kudriavtsev < kudryavtsev.konstan...@gmail.com> wrote: > Hi there, > > > I've started using Spark recently and evaluating possible use cases in our > company. > > I'm trying to save RDD as compressed Sequence file. I'm able

Spark output compression on HDFS

2014-04-02 Thread Kostiantyn Kudriavtsev
Hi there, I've started using Spark recently and evaluating possible use cases in our company. I'm trying to save RDD as compressed Sequence file. I'm able to save non-compressed file be calling: counts.saveAsSequenceFile(output) where counts is my RDD (IntWritable, Text). However, I didn't