Re: RDD Row Index

Sean Owen Wed, 20 Aug 2014 02:44:51 -0700

zipWithIndex() will give you something like an index for each element
in the RDD. If you files are small, you can use
SparkContext.wholeTextFiles() to load an RDD where each element is
(filename, content). Maybe that's what you are looking for if you are
really looking to extract an ID from the file name.


On Wed, Aug 20, 2014 at 8:35 AM, TJ Klein <[email protected]> wrote:
> Hi,
>
> I wonder if there is something like an (row) index to of the elements in the
> RDD. Specifically, my RDD is generated from a series of files, where the
> value corresponds the file contents. Ideally, I would like to have the keys
> to be an enumeration of the file number e.g. (0,<file contents 0>),(1,<file
> contents 1>).
> Any idea?
> Thanks,
>  Tassilo
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Row-Index-tp12457.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: RDD Row Index

Reply via email to