subject:"Re\: RDD Row Index"

Re: RDD Row Index

2014-08-21 Thread TJ Klein

Thanks. As my files are defined to be non-splittable, I eventually I ended up using mapPartitionsWithIndex() taking the split ID as index def g(splitIndex, iterator): yield (splitIndex, iterator.next()) myRDD.mapPartitionsWithIndex(g) -- View this message in context: http://apache-spark

Re: RDD Row Index

2014-08-20 Thread Sean Owen

zipWithIndex() will give you something like an index for each element in the RDD. If you files are small, you can use SparkContext.wholeTextFiles() to load an RDD where each element is (filename, content). Maybe that's what you are looking for if you are really looking to extract an ID from the fil