Only one reducer is always stuck. My table2 is small but using a Mapjoin makes 
my mappers run out of memory. My max reducers is 32 (also max reduce capacity). 
I tried setting num reducers to higher number (even 6000, which is appx. 
combination of dates & names I have) only to have lots of reducers with no data.
So I am quite sure its is some key in stage-1 thats is doing this.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.



________________________________
From: Mark Grover <mgro...@oanda.com>
To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com>
Sent: Wednesday, November 16, 2011 6:54 PM
Subject: Re: Severely hit by "curse of last reducer"

Hi Ayon,
Is it one particular reduce task that is slow or the entire reduce phase? How 
many reduce tasks did you have, anyways?

Looking into what the reducer key was might only make sense if a particular 
reduce task was slow.

If your table2 is small enough to fit in memory, you might want to try a map 
join.
More details at:
http://www.facebook.com/note.php?note_id=470667928919

Let me know what you find.

Mark

----- Original Message -----
From: "Ayon Sinha" <ayonsi...@yahoo.com>
To: "Hive Mailinglist" <user@hive.apache.org>
Sent: Wednesday, November 16, 2011 9:03:23 PM
Subject: Severely hit by "curse of last reducer"



Hi, 
Where do I find the log of what reducer key is causing the last reducer to go 
on for hours? The reducer logs don't say much about the key its processing. Is 
there a way to enable a debug mode where it would log the key it's processing? 


My query looks like: 


select partner_name, dates, sum(coins_granted) from table1 u join table2 p on 
u.partner_id=p.id group by partner_name, dates 



My uncompressed size of table1 is about 30GB. 

-Ayon 
See My Photos on Flickr 
Also check out my Blog for answers to commonly asked questions. 

Reply via email to