How about running it via sub-queries where each query runs over a subset of the data and has a better chance of finishing. I fear that the amount of data to shuffle might be too big and you might be running out of scratch/temp space. Did you verify that the job does not fail due to out of disk space before the shuffle/reduce can kick in ?
-Viral On Thu, Apr 25, 2013 at 3:10 PM, Sanjay Subramanian < sanjay.subraman...@wizecommerce.com> wrote: > That’s a lot of partitions for one Hive Job ! Not sure if that itself is > the root of the issues….There have been quite a few discussions on max > 1000-ish number of partitions as good… > Is your use case conducive too using Combiners (though they cannot be > guaranteed to be called) > Thanks > sanjay > > From: Srinivas Surasani <hivehadooplearn...@gmail.com> > Reply-To: "user@hive.apache.org" <user@hive.apache.org> > Date: Thursday, April 25, 2013 2:33 PM > To: "user@hive.apache.org" <user@hive.apache.org> > Subject: map tasks are taking ever when running job on 24 TB > > > Hi, > > I'm running hive job on 24TB dataset (on 34560 partitions ). here about > 500 to 1000 mappers are getting succeded (total of 80000) and rest mappaers > are taking for ever ( their status stays at 0% all times ). Is there any > limitations on number of partitions/dataset ? are there any paraemeters to > set here? > > Same job is suceeding on 18TB (25920 partitions ). > > I already set below in my hive query. > set mapreduce.jobtracker.split.metainfo.maxsize=-1; > > > Regards, > Srinivas > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >