Realized I was using spark-shell, so it assumes local file. By submitting a spark job, the same code worked fine..
On Tue, Sep 8, 2015 at 3:13 PM, shenyan zhen <shenya...@gmail.com> wrote: > Hi, > > For hdfs files written with below code: > > rdd.saveAsTextFile(getHdfsPath(...), classOf > [org.apache.hadoop.io.compress.GzipCodec]) > > > I can see the hdfs files been generated: > > > 0 /lz/streaming/am/1441734600000/_SUCCESS > > 1.6 M /lz/streaming/am/1441734600000/part-00000.gz > > 1.6 M /lz/streaming/am/1441734600000/part-00001.gz > > 1.6 M /lz/streaming/am/1441734600000/part-00002.gz > > ... > > > How do I read it using SparkContext? > > > My naive attempt: > > val t1 = sc.textFile("/lz/streaming/am/1441734600000") > > t1.take(1).head > > did not work: > > > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/lz/streaming/am/1441734600000 > > at > org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285) > > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) > > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) > > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) > > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) > > at scala.Option.getOrElse(Option.scala:120) > > at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) > > > Thanks, > > Shenyan > > > > >