Thanks. As my files are defined to be non-splittable, I eventually I ended up
using mapPartitionsWithIndex() taking the split ID as index
def g(splitIndex, iterator):
yield (splitIndex, iterator.next())
myRDD.mapPartitionsWithIndex(g)
--
View this message in context:
http://apache-spark
zipWithIndex() will give you something like an index for each element
in the RDD. If you files are small, you can use
SparkContext.wholeTextFiles() to load an RDD where each element is
(filename, content). Maybe that's what you are looking for if you are
really looking to extract an ID from the fil