Hi Mark, Apologies for the thin details on the query :) Here is the error log http://pastebin.com/pqxh4d1u the job tracker doesn't show any errors. I am using hive-0.7, I did set a threshold for the query and sadly i couldn't find any more documentation on skewjoins other than the wiki.
Thanks, -- Rohan Monga On Thu, Nov 17, 2011 at 2:02 PM, Mark Grover <mgro...@oanda.com> wrote: > Rohan, > The short answer is: I don't know:-) If you could paste the log, I or someone > else of the mailing list could be able to help. > > BTW, What version of Hive were you using? Did you set the threshold before > running the query? Try to find some documentation online if can tell what all > properties need to be set before Skew Join. My understanding was that the 2 > properties I mentioned below should suffice. > > Mark > > ----- Original Message ----- > From: "rohan monga" <monga.ro...@gmail.com> > To: user@hive.apache.org > Cc: "Ayon Sinha" <ayonsi...@yahoo.com> > Sent: Thursday, November 17, 2011 4:44:17 PM > Subject: Re: Severely hit by "curse of last reducer" > > Hi Mark, > I have tried setting hive.optimize.skewjoin=true, but it get a > NullPointerException after the first stage of the query completes. > Why does that happen? > > Thanks, > -- > Rohan Monga > > > > On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgro...@oanda.com> wrote: >> Ayon, >> I see. From what you explained, skew join seems like what you want. Have you >> tried that already? >> >> Details on how skew join works are in this presentation. Jump to 15 minute >> mark if you want to just listen about skew joins. >> http://www.youtube.com/watch?v=OB4H3Yt5VWM >> >> I bet you could also find something in the mail list archives related to >> Skew Join. >> >> In a nutshell (from the video), >> set hive.optimize.skewjoin=true >> set hive.skewjoin.key=<Threshold> >> >> should do the trick for you. Threshold, I believe, is the number of records >> you consider a large number to defer till later. >> >> Good luck! >> Mark >> >> ----- Original Message ----- >> From: "Ayon Sinha" <ayonsi...@yahoo.com> >> To: "Mark Grover" <mgro...@oanda.com>, user@hive.apache.org >> Sent: Wednesday, November 16, 2011 10:53:19 PM >> Subject: Re: Severely hit by "curse of last reducer" >> >> >> >> Only one reducer is always stuck. My table2 is small but using a Mapjoin >> makes my mappers run out of memory. My max reducers is 32 (also max reduce >> capacity). I tried setting num reducers to higher number (even 6000, which >> is appx. combination of dates & names I have) only to have lots of reducers >> with no data. >> So I am quite sure its is some key in stage-1 thats is doing this. >> >> -Ayon >> See My Photos on Flickr >> Also check out my Blog for answers to commonly asked questions. >> >> >> >> >> From: Mark Grover <mgro...@oanda.com> >> To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com> >> Sent: Wednesday, November 16, 2011 6:54 PM >> Subject: Re: Severely hit by "curse of last reducer" >> >> Hi Ayon, >> Is it one particular reduce task that is slow or the entire reduce phase? >> How many reduce tasks did you have, anyways? >> >> Looking into what the reducer key was might only make sense if a particular >> reduce task was slow. >> >> If your table2 is small enough to fit in memory, you might want to try a map >> join. >> More details at: >> http://www.facebook.com/note.php?note_id=470667928919 >> >> Let me know what you find. >> >> Mark >> >> ----- Original Message ----- >> From: "Ayon Sinha" < ayonsi...@yahoo.com > >> To: "Hive Mailinglist" < user@hive.apache.org > >> Sent: Wednesday, November 16, 2011 9:03:23 PM >> Subject: Severely hit by "curse of last reducer" >> >> >> >> Hi, >> Where do I find the log of what reducer key is causing the last reducer to >> go on for hours? The reducer logs don't say much about the key its >> processing. Is there a way to enable a debug mode where it would log the key >> it's processing? >> >> >> My query looks like: >> >> >> select partner_name, dates, sum(coins_granted) from table1 u join table2 p >> on u.partner_id=p.id group by partner_name, dates >> >> >> >> My uncompressed size of table1 is about 30GB. >> >> -Ayon >> See My Photos on Flickr >> Also check out my Blog for answers to commonly asked questions. >> >> >> >