Hi

Not sure this is the right way of doing it, but if you can create a
PairRDDFunction from that RDD then you can use the following piece of code
to access the filenames from the RDD.


                            PairRDDFunctions<K, V> ds = .....;

                                //getting the name and path for the
file name                               for(int 
i=0;i<ds.values().getPartitions().length;i++)
{                                       UnionPartition upp = (UnionPartition)
ds.values().getPartitions()[i];                                 
NewHadoopPartition npp =
(NewHadoopPartition) upp.split();                                               
                                System.out.println("File "
+ npp.serializableHadoopSplit().value().toString());

                                }



Thanks
Best Regards


On Tue, Aug 26, 2014 at 1:25 AM, yh18190 <yh18...@gmail.com> wrote:

> Hi Guys,
>
> I just want to know whether their is any way to determine which file is
> being handled by spark from a group of files input inside a
> directory.Suppose I have 1000 files which are given as input,I want to
> determine which file is being handled currently by spark program so that if
> any error creeps in at any point of time we can easily determine that
> particular file as faulty one.
>
> Please let me know your thoughts.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Request-for-Help-tp12776.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to