Re: Severely hit by "curse of last reducer"

Mohit Gupta Mon, 21 Nov 2011 12:03:46 -0800

Hi Ayon,
Were you able to solve this issue? I am facing the same problem. The last
reducer of my query has been running for more than 2 hours now.


Thanks
Mohit

On Fri, Nov 18, 2011 at 9:33 AM, Mark Grover <mgro...@oanda.com> wrote:

> Rohan,
> I took a look at the source code and wanted to share a couple of things:
> 1) Make sure the following 2 properties are being set to true (they are
> false by default):
> hive.optimize.skewjoin
> hive.auto.convert.join
>
> 2) The Hive source code that is causing the exception is:
>        String path = entry.getKey();
>        Path dirPath = new Path(path);
>        FileSystem inpFs = dirPath.getFileSystem(conf);
>        FileStatus[] fstatus = inpFs.listStatus(dirPath);
>        if (fstatus.length > 0) {
>
> It's the last line that throws the exception. Looking at the above code
> and Hadoop source code (in particular, FileSystem.java, Path.java and
> PathFilter.java all under org.apache.hadoop.fs.*), it seems like the
> filestatus for the directory path being provided to the job is not
> well-liked. That could happen because the expected directory path is a
> file(?), the directory does not exist or is empty. So, when the job fails
> see if the path under consideration exists or not.
>
> I don't know if it's a bug in the code. If so, perhaps, we should be
> checking for both fstats != null and fstatus.length > 0. If you are
> zealous, you can try making that change and recompiling your Hive or
> alternatively, implementing your own ConditionalResolverSkewJoin1 (which
> has the same implementation as ConditionalResolverSkewJoin but with this
> extra check). Then plug this new class in.
>
> Sorry for the long-winded answer,
> Mark
>
> ----- Original Message -----
> From: "rohan monga" <monga.ro...@gmail.com>
> To: user@hive.apache.org
> Cc: "Ayon Sinha" <ayonsi...@yahoo.com>
> Sent: Thursday, November 17, 2011 5:23:24 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
> Hi Mark,
> Apologies for the thin details on the query :)
> Here is the error log http://pastebin.com/pqxh4d1u the job tracker
> doesn't show any errors.
> I am using hive-0.7, I did set a threshold for the query and sadly i
> couldn't find any more documentation on skewjoins other than the wiki.
>
> Thanks,
> --
> Rohan Monga
>
>
>
> On Thu, Nov 17, 2011 at 2:02 PM, Mark Grover <mgro...@oanda.com> wrote:
> > Rohan,
> > The short answer is: I don't know:-) If you could paste the log, I or
> someone else of the mailing list could be able to help.
> >
> > BTW, What version of Hive were you using? Did you set the threshold
> before running the query? Try to find some documentation online if can tell
> what all properties need to be set before Skew Join. My understanding was
> that the 2 properties I mentioned below should suffice.
> >
> > Mark
> >
> > ----- Original Message -----
> > From: "rohan monga" <monga.ro...@gmail.com>
> > To: user@hive.apache.org
> > Cc: "Ayon Sinha" <ayonsi...@yahoo.com>
> > Sent: Thursday, November 17, 2011 4:44:17 PM
> > Subject: Re: Severely hit by "curse of last reducer"
> >
> > Hi Mark,
> > I have tried setting hive.optimize.skewjoin=true, but it get a
> > NullPointerException after the first stage of the query completes.
> > Why does that happen?
> >
> > Thanks,
> > --
> > Rohan Monga
> >
> >
> >
> > On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgro...@oanda.com> wrote:
> >> Ayon,
> >> I see. From what you explained, skew join seems like what you want.
> Have you tried that already?
> >>
> >> Details on how skew join works are in this presentation. Jump to 15
> minute mark if you want to just listen about skew joins.
> >> http://www.youtube.com/watch?v=OB4H3Yt5VWM
> >>
> >> I bet you could also find something in the mail list archives related
> to Skew Join.
> >>
> >> In a nutshell (from the video),
> >> set hive.optimize.skewjoin=true
> >> set hive.skewjoin.key=<Threshold>
> >>
> >> should do the trick for you. Threshold, I believe, is the number of
> records you consider a large number to defer till later.
> >>
> >> Good luck!
> >> Mark
> >>
> >> ----- Original Message -----
> >> From: "Ayon Sinha" <ayonsi...@yahoo.com>
> >> To: "Mark Grover" <mgro...@oanda.com>, user@hive.apache.org
> >> Sent: Wednesday, November 16, 2011 10:53:19 PM
> >> Subject: Re: Severely hit by "curse of last reducer"
> >>
> >>
> >>
> >> Only one reducer is always stuck. My table2 is small but using a
> Mapjoin makes my mappers run out of memory. My max reducers is 32 (also max
> reduce capacity). I tried setting num reducers to higher number (even 6000,
> which is appx. combination of dates & names I have) only to have lots of
> reducers with no data.
> >> So I am quite sure its is some key in stage-1 thats is doing this.
> >>
> >> -Ayon
> >> See My Photos on Flickr
> >> Also check out my Blog for answers to commonly asked questions.
> >>
> >>
> >>
> >>
> >> From: Mark Grover <mgro...@oanda.com>
> >> To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com>
> >> Sent: Wednesday, November 16, 2011 6:54 PM
> >> Subject: Re: Severely hit by "curse of last reducer"
> >>
> >> Hi Ayon,
> >> Is it one particular reduce task that is slow or the entire reduce
> phase? How many reduce tasks did you have, anyways?
> >>
> >> Looking into what the reducer key was might only make sense if a
> particular reduce task was slow.
> >>
> >> If your table2 is small enough to fit in memory, you might want to try
> a map join.
> >> More details at:
> >> http://www.facebook.com/note.php?note_id=470667928919
> >>
> >> Let me know what you find.
> >>
> >> Mark
> >>
> >> ----- Original Message -----
> >> From: "Ayon Sinha" < ayonsi...@yahoo.com >
> >> To: "Hive Mailinglist" < user@hive.apache.org >
> >> Sent: Wednesday, November 16, 2011 9:03:23 PM
> >> Subject: Severely hit by "curse of last reducer"
> >>
> >>
> >>
> >> Hi,
> >> Where do I find the log of what reducer key is causing the last reducer
> to go on for hours? The reducer logs don't say much about the key its
> processing. Is there a way to enable a debug mode where it would log the
> key it's processing?
> >>
> >>
> >> My query looks like:
> >>
> >>
> >> select partner_name, dates, sum(coins_granted) from table1 u join
> table2 p on u.partner_id=p.id group by partner_name, dates
> >>
> >>
> >>
> >> My uncompressed size of table1 is about 30GB.
> >>
> >> -Ayon
> >> See My Photos on Flickr
> >> Also check out my Blog for answers to commonly asked questions.
> >>
> >>
> >>
> >
>

Re: Severely hit by "curse of last reducer"

Reply via email to