Re: Appending an incrental value to each RDD record

mj Tue, 16 Dec 2014 08:03:51 -0800

You could try using zipWIthIndex (links below to API docs). For example, in
python:


items =['a','b','c']
items2= sc.parallelize(items)

print(items2.first())

items3=items2.map(lambda x: (x, x+"!"))

print(items3.first())

items4=items3.zipWithIndex()

print(items4.first())

items5=items4.map(lambda x: (x[1], x[0]))
print(items5.first())


This will give you an output of (0, ('a', 'a!')) - where the 0 is the index.
You could also use a map to increment them up by a value (e.g. if you wanted
to count from 1).

Links
http://spark.apache.org/docs/latest/api/python/index.html
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Appending-an-incrental-value-to-each-RDD-record-tp20718p20720.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Appending an incrental value to each RDD record

Reply via email to