Re: map tasks are taking ever when running job on 24 TB

Sanjay Subramanian Thu, 25 Apr 2013 15:11:30 -0700

That’s a lot of partitions for one Hive Job ! Not sure if that itself is the 
root of the issues….There have been quite a few discussions on max 1000-ish 
number of partitions as good…
Is your use case conducive too using Combiners (though they cannot be 
guaranteed to be called)
Thanks
sanjay

From: Srinivas Surasani 
<hivehadooplearn...@gmail.com<mailto:hivehadooplearn...@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, April 25, 2013 2:33 PM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" 
<user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: map tasks are taking ever when running job on 24 TB

Hi,

I'm running hive job on 24TB dataset (on 34560 partitions ). here about 500 to 
1000 mappers are getting succeded (total of 80000) and rest mappaers are taking 
for ever ( their status stays at 0% all times ).  Is there any limitations on 
number of partitions/dataset ? are there any paraemeters to set  here?

Same job  is suceeding on 18TB (25920 partitions ).

I already set below in my hive query.
set mapreduce.jobtracker.split.metainfo.maxsize=-1;

Regards,
Srinivas

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Re: map tasks are taking ever when running job on 24 TB

Reply via email to