Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Everett Anderson
On Wed, Feb 8, 2017 at 1:14 PM, ayan guha wrote: > Will a sql solution will be acceptable? > I'm very curious to see how it'd be done in raw SQL if you're up for it! I think the 2 programmatic solutions so far are viable, though, too. (By the way, thanks everyone for the great suggestions!)

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread ayan guha
Will a sql solution will be acceptable? On Thu, 9 Feb 2017 at 4:01 am, Xiaomeng Wan wrote: > You could also try pivot. > > On 7 February 2017 at 16:13, Everett Anderson > wrote: > > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust > wrote: > > I think the fastest way is likely to use a comb

Re: Un-exploding / denormalizing Spark SQL help

2017-02-08 Thread Xiaomeng Wan
You could also try pivot. On 7 February 2017 at 16:13, Everett Anderson wrote: > > > On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust > wrote: > >> I think the fastest way is likely to use a combination of conditionals >> (when / otherwise), first (ignoring nulls), while grouping by the id. >>

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 2:21 PM, Michael Armbrust wrote: > I think the fastest way is likely to use a combination of conditionals > (when / otherwise), first (ignoring nulls), while grouping by the id. > This should get the answer with only a single shuffle. > > Here is an example >

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Michael Armbrust
I think the fastest way is likely to use a combination of conditionals (when / otherwise), first (ignoring nulls), while grouping by the id. This should get the answer with only a single shuffle. Here is an example

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi Everett, That's pretty much what I'd do. Can't think of a way to beat your solution. Why do you "feel vaguely uneasy about it"? I'd also check out the execution plan (with explain) to see how it's gonna work at runtime. I may have seen groupBy + join be better than window (there were more exch

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
On Tue, Feb 7, 2017 at 12:50 PM, Jacek Laskowski wrote: > Hi, > > Could groupBy and withColumn or UDAF work perhaps? I think window could > help here too. > This seems to work, but I do feel vaguely uneasy about it. :) // First add a 'rank' column which is priority order just in case priorities

Re: Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Jacek Laskowski
Hi, Could groupBy and withColumn or UDAF work perhaps? I think window could help here too. Jacek On 7 Feb 2017 8:02 p.m., "Everett Anderson" wrote: > Hi, > > I'm trying to un-explode or denormalize a table like > > +---++-+--++ > |id |name|extra|data |priority| > +---+

Un-exploding / denormalizing Spark SQL help

2017-02-07 Thread Everett Anderson
Hi, I'm trying to un-explode or denormalize a table like +---++-+--++ |id |name|extra|data |priority| +---++-+--++ |1 |Fred|8|value1|1 | |1 |Fred|8|value8|2 | |1 |Fred|8|value5|3 | |2 |Amy |9|value3|1 | |2 |Amy