Re: read compressed hdfs files using SparkContext.textFile?

2015-09-08 Thread shenyan zhen
Realized I was using spark-shell, so it assumes local file. By submitting a spark job, the same code worked fine.. On Tue, Sep 8, 2015 at 3:13 PM, shenyan zhen wrote: > Hi, > > For hdfs files written with below code: > > rdd.saveAsTextFile(getHdfsPat

read compressed hdfs files using SparkContext.textFile?

2015-09-08 Thread shenyan zhen
Hi, For hdfs files written with below code: rdd.saveAsTextFile(getHdfsPath(...), classOf [org.apache.hadoop.io.compress.GzipCodec]) I can see the hdfs files been generated: 0 /lz/streaming/am/144173460/_SUCCESS 1.6 M /lz/streaming/am/144173460/part-0.gz 1.6 M /lz/streamin

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread shenyan zhen
t; > On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu wrote: > >> The folder is in "/tmp" by default. Could you use "df -h" to check the >> free space of /tmp? >> >> Best Regards, >> Shixiong Zhu >> >> 2015-09-05 9:50 GMT+08:00 shenyan

SparkContext initialization error- java.io.IOException: No space left on device

2015-09-04 Thread shenyan zhen
Has anyone seen this error? Not sure which dir the program was trying to write to. I am running Spark 1.4.1, submitting Spark job to Yarn, in yarn-client mode. 15/09/04 21:36:06 ERROR SparkContext: Error adding jar (java.io.IOException: No space left on device), was the --addJars option used? 15

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
s DataFrame? I am later on, converting back to RDDs. > > 2. I lack of some proper criteria to decide a proper column for > distributon. My table has more than 400 columns. > > > > Saif > > > > *From:* shenyan zhen [mailto:shenya...@gmail.com] > *Sent

Re: Fighting against performance: JDBC RDD badly distributed

2015-07-28 Thread shenyan zhen
Hi Saif, Are you using JdbcRDD directly from Spark? If yes, then the poor distribution could be due to the bound key you used. See the JdbcRDD Scala doc at https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.JdbcRDD : sql the text of the query. The query must contain t

Re: Meets class not found error in spark console with newly hive context

2015-07-02 Thread shenyan zhen
In case it helps: I got around it temporarily by saving and reseting the context class loader around creating HiveContext. On Jul 2, 2015 4:36 AM, "Terry Hole" wrote: > Found this a bug in spark 1.4.0: SPARK-8368 > > > Thanks! > Terry > > On Thu,