Hi,
For hdfs files written with below code:
rdd.saveAsTextFile(getHdfsPath(...), classOf
[org.apache.hadoop.io.compress.GzipCodec])
I can see the hdfs files been generated:
0 /lz/streaming/am/1441734600000/_SUCCESS
1.6 M /lz/streaming/am/1441734600000/part-00000.gz
1.6 M /lz/streaming/am/1441734600000/part-00001.gz
1.6 M /lz/streaming/am/1441734600000/part-00002.gz
...
How do I read it using SparkContext?
My naive attempt:
val t1 = sc.textFile("/lz/streaming/am/1441734600000")
t1.take(1).head
did not work:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/lz/streaming/am/1441734600000
at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:285)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
Thanks,
Shenyan