Re: Severely hit by "curse of last reducer"

rohan monga Thu, 17 Nov 2011 13:44:44 -0800

Hi Mark,
I have tried setting hive.optimize.skewjoin=true, but it get a
NullPointerException after the first stage of the query completes.
Why does that happen?


Thanks,
--
Rohan Monga



On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgro...@oanda.com> wrote:
> Ayon,
> I see. From what you explained, skew join seems like what you want. Have you 
> tried that already?
>
> Details on how skew join works are in this presentation. Jump to 15 minute 
> mark if you want to just listen about skew joins.
> http://www.youtube.com/watch?v=OB4H3Yt5VWM
>
> I bet you could also find something in the mail list archives related to Skew 
> Join.
>
> In a nutshell (from the video),
> set hive.optimize.skewjoin=true
> set hive.skewjoin.key=<Threshold>
>
> should do the trick for you. Threshold, I believe, is the number of records 
> you consider a large number to defer till later.
>
> Good luck!
> Mark
>
> ----- Original Message -----
> From: "Ayon Sinha" <ayonsi...@yahoo.com>
> To: "Mark Grover" <mgro...@oanda.com>, user@hive.apache.org
> Sent: Wednesday, November 16, 2011 10:53:19 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
>
>
> Only one reducer is always stuck. My table2 is small but using a Mapjoin 
> makes my mappers run out of memory. My max reducers is 32 (also max reduce 
> capacity). I tried setting num reducers to higher number (even 6000, which is 
> appx. combination of dates & names I have) only to have lots of reducers with 
> no data.
> So I am quite sure its is some key in stage-1 thats is doing this.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
>
> From: Mark Grover <mgro...@oanda.com>
> To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com>
> Sent: Wednesday, November 16, 2011 6:54 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
> Hi Ayon,
> Is it one particular reduce task that is slow or the entire reduce phase? How 
> many reduce tasks did you have, anyways?
>
> Looking into what the reducer key was might only make sense if a particular 
> reduce task was slow.
>
> If your table2 is small enough to fit in memory, you might want to try a map 
> join.
> More details at:
> http://www.facebook.com/note.php?note_id=470667928919
>
> Let me know what you find.
>
> Mark
>
> ----- Original Message -----
> From: "Ayon Sinha" < ayonsi...@yahoo.com >
> To: "Hive Mailinglist" < user@hive.apache.org >
> Sent: Wednesday, November 16, 2011 9:03:23 PM
> Subject: Severely hit by "curse of last reducer"
>
>
>
> Hi,
> Where do I find the log of what reducer key is causing the last reducer to go 
> on for hours? The reducer logs don't say much about the key its processing. 
> Is there a way to enable a debug mode where it would log the key it's 
> processing?
>
>
> My query looks like:
>
>
> select partner_name, dates, sum(coins_granted) from table1 u join table2 p on 
> u.partner_id=p.id group by partner_name, dates
>
>
>
> My uncompressed size of table1 is about 30GB.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>

Re: Severely hit by "curse of last reducer"

Reply via email to