Could you explain why this would work?
Assaf.
From: Haviv, Daniel [mailto:dha...@amazon.com]
Sent: Sunday, January 29, 2017 7:09 PM
To: Mendelson, Assaf
Cc: user@spark.apache.org
Subject: Re: forcing dataframe groupby partitioning
If there's no built in local groupBy, You could do something
If there's no built in local groupBy, You could do something like that:
df.groupby(C1,C2).agg(...).flatmap(x=>x.groupBy(C1)).agg
Thank you.
Daniel
On 29 Jan 2017, at 18:33, Mendelson, Assaf
mailto:assaf.mendel...@rsa.com>> wrote:
Hi,
Consider the following example:
df.groupby(C1,C2).agg(s
Hi,
Consider the following example:
df.groupby(C1,C2).agg(some agg).groupby(C1).agg(some more agg)
The default way spark would behave would be to shuffle according to a
combination of C1 and C2 and then shuffle again by C1 only.
This behavior makes sense when one uses C2 to salt C1 for skew