How about PartitionerAwareUnionRDD?

Regards
Mayur

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, Mar 6, 2014 at 9:42 AM, Evan Chan <e...@ooyala.com> wrote:

> I would love to hear the answer to this as well.
>
> On Thu, Mar 6, 2014 at 4:09 AM, Manoj Awasthi <awasthi.ma...@gmail.com>
> wrote:
> > Hi All,
> >
> >
> > I have a three machine cluster. I have two RDDs each consisting of (K,V)
> > pairs. RDDs have just three keys 'a', 'b' and 'c'.
> >
> >     // list1 - List(('a',1), ('b',2), ....
> >     val rdd1 = sc.parallelize(list1).groupByKey(new HashPartitioner(3))
> >
> >     // list2 - List(('a',2), ('b',7), ....
> >     val rdd2 = sc.parallelize(list2).groupByKey(new HashPartitioner(3))
> >
> > By using a HashPartitioner with 3 partitions I can achieve that each of
> the
> > keys ('a', 'b' and 'c') in each RDD gets partitioned on different
> machines
> > on cluster (based on the hashCode).
> >
> > Problem is that I cannot deterministically do the same allocation for
> > second RDD? (all 'a's from rdd2 going to the same machine where 'a's from
> > first RDD went to).
> >
> > Is there a way to achieve this?
> >
> > Manoj
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> e...@ooyala.com  |
>

Reply via email to