subject:"RepartitionByKey Behavior"

Re: RepartitionByKey Behavior

2018-06-26 Thread Chawla,Sumit

Thanks everyone. As Nathan suggested, I ended up collecting the distinct keys first and then assigning Ids to each key explicitly. Regards Sumit Chawla On Fri, Jun 22, 2018 at 7:29 AM, Nathan Kronenfeld < nkronenfeld@uncharted.software> wrote: > On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit

Re: RepartitionByKey Behavior

2018-06-22 Thread Nathan Kronenfeld

> > On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit >>> wrote: >>> Hi I have been trying to this simple operation. I want to land all values with one key in same partition, and not have any different key in the same partition. Is this possible? I am getting b and c alwa

Re: RepartitionByKey Behavior

2018-06-21 Thread Jungtaek Lim

It is not possible because the cardinality of the partitioning key is non-deterministic, while partition count should be fixed. There's a chance that cardinality > partition count and then the system can't ensure the requirement. Thanks, Jungtaek Lim (HeartSaVioR) 2018년 6월 22일 (금) 오전 8:55, Chawla

Re: RepartitionByKey Behavior

2018-06-21 Thread Chawla,Sumit

Based on code read it looks like Spark does modulo of key for partition. Keys of c and b end up pointing to same value. Whats the best partitioning scheme to deal with this? Regards Sumit Chawla On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit wrote: > Hi > > I have been trying to this simple o

RepartitionByKey Behavior

2018-06-21 Thread Chawla,Sumit

Hi I have been trying to this simple operation. I want to land all values with one key in same partition, and not have any different key in the same partition. Is this possible? I am getting b and c always getting mixed up in the same partition. rdd = sc.parallelize([('a', 5), ('d', 8), ('b

Re: RepartitionByKey Behavior

Re: RepartitionByKey Behavior

Re: RepartitionByKey Behavior

Re: RepartitionByKey Behavior

RepartitionByKey Behavior

5 matches

Site Navigation

Mail list logo

Footer information