Hi Not sure this is the right way of doing it, but if you can create a PairRDDFunction from that RDD then you can use the following piece of code to access the filenames from the RDD.
PairRDDFunctions<K, V> ds = .....; //getting the name and path for the file name for(int i=0;i<ds.values().getPartitions().length;i++) { UnionPartition upp = (UnionPartition) ds.values().getPartitions()[i]; NewHadoopPartition npp = (NewHadoopPartition) upp.split(); System.out.println("File " + npp.serializableHadoopSplit().value().toString()); } Thanks Best Regards On Tue, Aug 26, 2014 at 1:25 AM, yh18190 <yh18...@gmail.com> wrote: > Hi Guys, > > I just want to know whether their is any way to determine which file is > being handled by spark from a group of files input inside a > directory.Suppose I have 1000 files which are given as input,I want to > determine which file is being handled currently by spark program so that if > any error creeps in at any point of time we can easily determine that > particular file as faulty one. > > Please let me know your thoughts. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-Help-tp12776.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >