Yeah..., buat apparently mapPartitionsWithInputSplit thing
is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that,
I'm not sure that it's a good idea to use the function.
For this problem, I had to create a subclass HadoopRDD and use
mapPartitions instead.
Is there any reason w
I just found a possible answer:
http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/
Will give a try on it. Although it is a bit troublesome, but if it works,
will give what I want.
Sorry for bother everyone here
Regards,
Shuai
On Sun, Dec 21, 2014 at 4:43 P
Hi All,
When I try to load a folder into the RDDs, any way for me to find the input
file name of particular partitions? So I can track partitions from which
file.
In the hadoop, I can find this information through the code:
FileSplit fileSplit = (FileSplit) context.getInputSplit();
String strFil