Hi Mark, I have tried setting hive.optimize.skewjoin=true, but it get a NullPointerException after the first stage of the query completes. Why does that happen?
Thanks, -- Rohan Monga On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgro...@oanda.com> wrote: > Ayon, > I see. From what you explained, skew join seems like what you want. Have you > tried that already? > > Details on how skew join works are in this presentation. Jump to 15 minute > mark if you want to just listen about skew joins. > http://www.youtube.com/watch?v=OB4H3Yt5VWM > > I bet you could also find something in the mail list archives related to Skew > Join. > > In a nutshell (from the video), > set hive.optimize.skewjoin=true > set hive.skewjoin.key=<Threshold> > > should do the trick for you. Threshold, I believe, is the number of records > you consider a large number to defer till later. > > Good luck! > Mark > > ----- Original Message ----- > From: "Ayon Sinha" <ayonsi...@yahoo.com> > To: "Mark Grover" <mgro...@oanda.com>, user@hive.apache.org > Sent: Wednesday, November 16, 2011 10:53:19 PM > Subject: Re: Severely hit by "curse of last reducer" > > > > Only one reducer is always stuck. My table2 is small but using a Mapjoin > makes my mappers run out of memory. My max reducers is 32 (also max reduce > capacity). I tried setting num reducers to higher number (even 6000, which is > appx. combination of dates & names I have) only to have lots of reducers with > no data. > So I am quite sure its is some key in stage-1 thats is doing this. > > -Ayon > See My Photos on Flickr > Also check out my Blog for answers to commonly asked questions. > > > > > From: Mark Grover <mgro...@oanda.com> > To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com> > Sent: Wednesday, November 16, 2011 6:54 PM > Subject: Re: Severely hit by "curse of last reducer" > > Hi Ayon, > Is it one particular reduce task that is slow or the entire reduce phase? How > many reduce tasks did you have, anyways? > > Looking into what the reducer key was might only make sense if a particular > reduce task was slow. > > If your table2 is small enough to fit in memory, you might want to try a map > join. > More details at: > http://www.facebook.com/note.php?note_id=470667928919 > > Let me know what you find. > > Mark > > ----- Original Message ----- > From: "Ayon Sinha" < ayonsi...@yahoo.com > > To: "Hive Mailinglist" < user@hive.apache.org > > Sent: Wednesday, November 16, 2011 9:03:23 PM > Subject: Severely hit by "curse of last reducer" > > > > Hi, > Where do I find the log of what reducer key is causing the last reducer to go > on for hours? The reducer logs don't say much about the key its processing. > Is there a way to enable a debug mode where it would log the key it's > processing? > > > My query looks like: > > > select partner_name, dates, sum(coins_granted) from table1 u join table2 p on > u.partner_id=p.id group by partner_name, dates > > > > My uncompressed size of table1 is about 30GB. > > -Ayon > See My Photos on Flickr > Also check out my Blog for answers to commonly asked questions. > > >