subject:"Re\: help plz\! how to use zipWithIndex to each subset of a RDD"

Re: help plz! how to use zipWithIndex to each subset of a RDD

2015-07-30 Thread askformore

Hi @rok, thanks I got it -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071p24080.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: help plz! how to use zipWithIndex to each subset of a RDD

2015-07-30 Thread rok

zipWithIndex gives you global indices, which is not what you want. You'll want to use flatMap with a map function that iterates through each iterable and returns the (String, Int, String) tuple for each element. On Thu, Jul 30, 2015 at 4:13 AM, askformore [via Apache Spark User List] < ml-node+s10

Re: help plz! how to use zipWithIndex to each subset of a RDD

2015-07-29 Thread Jeff Zhang

This may be what you want val conf = new SparkConf().setMaster("local").setAppName("test") val sc = new SparkContext(conf) val inputRdd = sc.parallelize(Array(("key_1", "a"), ("key_1","b"), ("key_2","c"), ("key_2", "d"))) val result = inputRdd.groupByKey().flatMap(e=>{ val key= e._1 val valu

Re: help plz! how to use zipWithIndex to each subset of a RDD

2015-07-29 Thread ayan guha

Is there a relationship between data and index? I.e with a,b,c to 1,2,3? On 30 Jul 2015 12:13, "askformore" wrote: > I have some data like this: RDD[(String, String)] = ((*key-1*, a), ( > *key-1*,b), (*key-2*,a), (*key-2*,c),(*key-3*,b),(*key-4*,d)) and I want > to group the data by Key, and for