zipWithIndex() will give you something like an index for each element in the RDD. If you files are small, you can use SparkContext.wholeTextFiles() to load an RDD where each element is (filename, content). Maybe that's what you are looking for if you are really looking to extract an ID from the file name.
On Wed, Aug 20, 2014 at 8:35 AM, TJ Klein <tjkl...@gmail.com> wrote: > Hi, > > I wonder if there is something like an (row) index to of the elements in the > RDD. Specifically, my RDD is generated from a series of files, where the > value corresponds the file contents. Ideally, I would like to have the keys > to be an enumeration of the file number e.g. (0,<file contents 0>),(1,<file > contents 1>). > Any idea? > Thanks, > Tassilo > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Row-Index-tp12457.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org