Hi Janardhan,
You could something like this :
For maintaining the insertion order by the key first partition by Key (so
that each key is located in the same partition) and after that you can do
something like this.
RDD.mapValues( x => ArrayBuffer(x)).reduceByKey(x,y => x+
Let me provide step wise details:
1.
I have an RDD = {
(ID2,18159) - *element 1 *
(ID1,18159) - *element 2*
(ID3,18159) - *element 3*
(ID2,36318) - *element 4 *
(ID1,36318) - *element 5*
(ID3,36318)
(ID2,54477)
(ID1,54477)
(ID3,54477)
}
2. RDD.groupByKey().mapValues(v => v.toArray())
Array(
(I
Apologies janardhan, i always get confused on this
Ok. so you have a (key, val) RDD (val is irrelevant here)
then you can do this
val reduced = myRDD.reduceByKey((first, second) => first ++ second)
val sorted = reduced.sortBy(tpl => tpl._1)
hth
On Tue, Jul 26, 2016 at 3:31 AM, janardhan she
groupBy is a shuffle operation and index is already lost in this process if
I am not wrong and don't see *sortWith* operation on RDD.
Any suggestions or help ?
On Mon, Jul 25, 2016 at 12:58 AM, Marco Mistroni
wrote:
> Hi
> after you do a groupBy you should use a sortWith.
> Basically , a group
Hi
after you do a groupBy you should use a sortWith.
Basically , a groupBy reduces your structure to (anyone correct me if i m
wrong) a RDD[(key,val)], which you can see as a tuple.so you could use
sortWith (or sortBy, cannot remember which one) (tpl=> tpl._1)
hth
On Mon, Jul 25, 2016 at 1:21
Thanks Marco. This solved the order problem. Had another question which is
prefix to this.
As you can see below ID2,ID1 and ID3 are in order and I need to maintain
this index order as well. But when we do groupByKey
operation(*rdd.distinct.groupByKey().mapValues(v
=> v.toArray*))
everything is *ju
Hello
Uhm you have an array containing 3 tuples?
If all the arrays have same length, you can just zip all of them, creatings
a list of tuples
then you can scan the list 5 by 5...?
so something like
(Array(0)_2,Array(1)._2,Array(2)._2).zipped.toList
this will give you a list of tuples of 3 eleme
Array(
(ID1,Array(18159, 308703, 72636, 64544, 39244, 107937, 54477, 145272,
100079, 36318, 160992, 817, 89366, 150022, 19622, 44683, 58866, 162076,
45431, 100136)),
(ID3,Array(100079, 19622, 18159, 212064, 107937, 44683, 150022, 39244,
100136, 58866, 72636, 145272, 817, 89366, 54477, 36318, 308703
Apologies I misinterpreted could you post two use cases?
Kr
On 24 Jul 2016 3:41 pm, "janardhan shetty" wrote:
> Marco,
>
> Thanks for the response. It is indexed order and not ascending or
> descending order.
> On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote:
>
>> Use map values to transfor
Marco,
Thanks for the response. It is indexed order and not ascending or
descending order.
On Jul 24, 2016 7:37 AM, "Marco Mistroni" wrote:
> Use map values to transform to an rdd where values are sorted?
> Hth
>
> On 24 Jul 2016 6:23 am, "janardhan shetty" wrote:
>
>> I have a key,value pair r
I have a key,value pair rdd where value is an array of Ints. I need to
maintain the order of the value in order to execute downstream
modifications. How do we maintain the order of values?
Ex:
rdd = (id1,[5,2,3,15],
Id2,[9,4,2,5])
Followup question how do we compare between one element in rdd
11 matches
Mail list logo