subject:"Maintaining order of pair rdd"

Re: Maintaining order of pair rdd

2016-07-26 Thread Kuchekar

Hi Janardhan, You could something like this : For maintaining the insertion order by the key first partition by Key (so that each key is located in the same partition) and after that you can do something like this. RDD.mapValues( x => ArrayBuffer(x)).reduceByKey(x,y => x+

Re: Maintaining order of pair rdd

2016-07-26 Thread janardhan shetty

Let me provide step wise details: 1. I have an RDD = { (ID2,18159) - *element 1 * (ID1,18159) - *element 2* (ID3,18159) - *element 3* (ID2,36318) - *element 4 * (ID1,36318) - *element 5* (ID3,36318) (ID2,54477) (ID1,54477) (ID3,54477) } 2. RDD.groupByKey().mapValues(v => v.toArray()) Array( (I

Re: Maintaining order of pair rdd

2016-07-26 Thread Marco Mistroni

Apologies janardhan, i always get confused on this Ok. so you have a (key, val) RDD (val is irrelevant here) then you can do this val reduced = myRDD.reduceByKey((first, second) => first ++ second) val sorted = reduced.sortBy(tpl => tpl._1) hth On Tue, Jul 26, 2016 at 3:31 AM, janardhan she

Re: Maintaining order of pair rdd

2016-07-25 Thread janardhan shetty

groupBy is a shuffle operation and index is already lost in this process if I am not wrong and don't see *sortWith* operation on RDD. Any suggestions or help ? On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni wrote: > Hi > after you do a groupBy you should use a sortWith. > Basically , a group

Re: Maintaining order of pair rdd

2016-07-25 Thread Marco Mistroni

Hi after you do a groupBy you should use a sortWith. Basically , a groupBy reduces your structure to (anyone correct me if i m wrong) a RDD[(key,val)], which you can see as a tuple.so you could use sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1) hth On Mon, Jul 25, 2016 at 1:21

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty

Thanks Marco. This solved the order problem. Had another question which is prefix to this. As you can see below ID2,ID1 and ID3 are in order and I need to maintain this index order as well. But when we do groupByKey operation(*rdd.distinct.groupByKey().mapValues(v => v.toArray*)) everything is *ju

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni

Hello Uhm you have an array containing 3 tuples? If all the arrays have same length, you can just zip all of them, creatings a list of tuples then you can scan the list 5 by 5...? so something like (Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList this will give you a list of tuples of 3 eleme

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty

Array( (ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272, 100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076, 45431, 100136)), (ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244, 100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703

Re: Maintaining order of pair rdd

2016-07-24 Thread Marco Mistroni

Apologies I misinterpreted could you post two use cases? Kr On 24 Jul 2016 3:41 pm, "janardhan shetty" wrote: > Marco, > > Thanks for the response. It is indexed order and not ascending or > descending order. > On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote: > >> Use map values to transfor

Re: Maintaining order of pair rdd

2016-07-24 Thread janardhan shetty

Marco, Thanks for the response. It is indexed order and not ascending or descending order. On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote: > Use map values to transform to an rdd where values are sorted? > Hth > > On 24 Jul 2016 6:23 am, "janardhan shetty" wrote: > >> I have a key,value pair r

Maintaining order of pair rdd

2016-07-23 Thread janardhan shetty

I have a key,value pair rdd where value is an array of Ints. I need to maintain the order of the value in order to execute downstream modifications. How do we maintain the order of values? Ex: rdd = (id1,[5,2,3,15], Id2,[9,4,2,5]) Followup question how do we compare between one element in rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Re: Maintaining order of pair rdd

Maintaining order of pair rdd

11 matches

Site Navigation

Mail list logo

Footer information