Hi Mark,
Apologies for the thin details on the query :)
Here is the error log http://pastebin.com/pqxh4d1u the job tracker
doesn't show any errors.
I am using hive-0.7, I did set a threshold for the query and sadly i
couldn't find any more documentation on skewjoins other than the wiki.

Thanks,
--
Rohan Monga



On Thu, Nov 17, 2011 at 2:02 PM, Mark Grover <mgro...@oanda.com> wrote:
> Rohan,
> The short answer is: I don't know:-) If you could paste the log, I or someone 
> else of the mailing list could be able to help.
>
> BTW, What version of Hive were you using? Did you set the threshold before 
> running the query? Try to find some documentation online if can tell what all 
> properties need to be set before Skew Join. My understanding was that the 2 
> properties I mentioned below should suffice.
>
> Mark
>
> ----- Original Message -----
> From: "rohan monga" <monga.ro...@gmail.com>
> To: user@hive.apache.org
> Cc: "Ayon Sinha" <ayonsi...@yahoo.com>
> Sent: Thursday, November 17, 2011 4:44:17 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
> Hi Mark,
> I have tried setting hive.optimize.skewjoin=true, but it get a
> NullPointerException after the first stage of the query completes.
> Why does that happen?
>
> Thanks,
> --
> Rohan Monga
>
>
>
> On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgro...@oanda.com> wrote:
>> Ayon,
>> I see. From what you explained, skew join seems like what you want. Have you 
>> tried that already?
>>
>> Details on how skew join works are in this presentation. Jump to 15 minute 
>> mark if you want to just listen about skew joins.
>> http://www.youtube.com/watch?v=OB4H3Yt5VWM
>>
>> I bet you could also find something in the mail list archives related to 
>> Skew Join.
>>
>> In a nutshell (from the video),
>> set hive.optimize.skewjoin=true
>> set hive.skewjoin.key=<Threshold>
>>
>> should do the trick for you. Threshold, I believe, is the number of records 
>> you consider a large number to defer till later.
>>
>> Good luck!
>> Mark
>>
>> ----- Original Message -----
>> From: "Ayon Sinha" <ayonsi...@yahoo.com>
>> To: "Mark Grover" <mgro...@oanda.com>, user@hive.apache.org
>> Sent: Wednesday, November 16, 2011 10:53:19 PM
>> Subject: Re: Severely hit by "curse of last reducer"
>>
>>
>>
>> Only one reducer is always stuck. My table2 is small but using a Mapjoin 
>> makes my mappers run out of memory. My max reducers is 32 (also max reduce 
>> capacity). I tried setting num reducers to higher number (even 6000, which 
>> is appx. combination of dates & names I have) only to have lots of reducers 
>> with no data.
>> So I am quite sure its is some key in stage-1 thats is doing this.
>>
>> -Ayon
>> See My Photos on Flickr
>> Also check out my Blog for answers to commonly asked questions.
>>
>>
>>
>>
>> From: Mark Grover <mgro...@oanda.com>
>> To: user@hive.apache.org; Ayon Sinha <ayonsi...@yahoo.com>
>> Sent: Wednesday, November 16, 2011 6:54 PM
>> Subject: Re: Severely hit by "curse of last reducer"
>>
>> Hi Ayon,
>> Is it one particular reduce task that is slow or the entire reduce phase? 
>> How many reduce tasks did you have, anyways?
>>
>> Looking into what the reducer key was might only make sense if a particular 
>> reduce task was slow.
>>
>> If your table2 is small enough to fit in memory, you might want to try a map 
>> join.
>> More details at:
>> http://www.facebook.com/note.php?note_id=470667928919
>>
>> Let me know what you find.
>>
>> Mark
>>
>> ----- Original Message -----
>> From: "Ayon Sinha" < ayonsi...@yahoo.com >
>> To: "Hive Mailinglist" < user@hive.apache.org >
>> Sent: Wednesday, November 16, 2011 9:03:23 PM
>> Subject: Severely hit by "curse of last reducer"
>>
>>
>>
>> Hi,
>> Where do I find the log of what reducer key is causing the last reducer to 
>> go on for hours? The reducer logs don't say much about the key its 
>> processing. Is there a way to enable a debug mode where it would log the key 
>> it's processing?
>>
>>
>> My query looks like:
>>
>>
>> select partner_name, dates, sum(coins_granted) from table1 u join table2 p 
>> on u.partner_id=p.id group by partner_name, dates
>>
>>
>>
>> My uncompressed size of table1 is about 30GB.
>>
>> -Ayon
>> See My Photos on Flickr
>> Also check out my Blog for answers to commonly asked questions.
>>
>>
>>
>

Reply via email to