Re: How to get the HDFS path for each RDD

2015-09-24 Thread Anchit Choudhry
= sparkContext.wholeTextFile("hdfs://a-hdfs-path") More info: https://spark.apache.org/docs/latest/api/scala/index.html#org. apache.spark.SparkContext@wholeTextFiles(String,Int):RDD[(String,String)] Let us know if this helps or you need more help. Thanks, Anchit Choudhry On 24 September 2015 at 23:12, F

Re: How to get the HDFS path for each RDD

2015-09-24 Thread Anchit Choudhry
each line in my > JSON data. > > > > On Sep 25, 2015, at 11:25, Anchit Choudhry > wrote: > > Hi Fengdong, > > Thanks for your question. > > Spark already has a function called wholeTextFiles within sparkContext > which can help you with that: > > Pyt

Re: How to get the HDFS path for each RDD

2015-09-24 Thread Anchit Choudhry
such as I have two data sets: > > date set A: /data/test1/dt=20100101 > data set B: /data/test2/dt=20100202 > > > all data has the same JSON format , such as: > {“key1” : “value1”, “key2” : “value2” } > > > my output expected: > {“key1” : “value1”, “key2” : “value2