Hi @rok, thanks I got it
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071p24080.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
zipWithIndex gives you global indices, which is not what you want. You'll
want to use flatMap with a map function that iterates through each iterable
and returns the (String, Int, String) tuple for each element.
On Thu, Jul 30, 2015 at 4:13 AM, askformore [via Apache Spark User List] <
ml-node+s10
This may be what you want
val conf = new SparkConf().setMaster("local").setAppName("test")
val sc = new SparkContext(conf)
val inputRdd = sc.parallelize(Array(("key_1", "a"), ("key_1","b"),
("key_2","c"), ("key_2", "d")))
val result = inputRdd.groupByKey().flatMap(e=>{
val key= e._1
val valu
Is there a relationship between data and index? I.e with a,b,c to 1,2,3?
On 30 Jul 2015 12:13, "askformore" wrote:
> I have some data like this: RDD[(String, String)] = ((*key-1*, a), (
> *key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d)) and I want
> to group the data by Key, and for