Hi, I am CC'ing this to hive-user as well .
I tried to do a simple join between two tables 2.2GB and 137MB. select count(*) from A JOIN B ON (A.a = B.b); The query ran for 7 hours . I am sure this is not normal. The reducer gets stuck at reduce > reduce phase . Map, copy phases complete just in a matter of minutes and it gets stuck at reducer. Please see my previous mail below for my config and vmstat output. My job has 40 Maps and 7 reduces. My JT and TT logs doesn't show any warnings, except that one of my nodes got black listed because of Too many fetch failures. Initially there was an error in that node's hosts file. I corrected it and restarted the cluster. Even then that node gets blacklisted frequently. Should I restart the node after changing hosts file? Any help ? 7 hrs is too large for such a simple query. On Thu, Sep 22, 2011 at 5:43 AM, Raj V <rajv...@yahoo.com> wrote: > 2GB for a task tracker? Here are some possible thoughts. > Compress map output. > Change mapred.reduce.slowstart.completed.maps > > > By the way I see no swapping. Anything interesting from the task tracker > log? System log? > > Raj > > > > > > >________________________________ > >From: john smith <js1987.sm...@gmail.com> > >To: common-u...@hadoop.apache.org > >Sent: Wednesday, September 21, 2011 4:52 PM > >Subject: Reducer hanging ( swapping? ) > > > >Hi Folks, > > > >I am running hive on a 10 node cluster. Since my hive queries have joins > in > >them, their reduce phases are a bit heavy. > > > >I have 2GB RAM on each TT . The problem is that my reducer hangs at 76% > for > >a large amount of time. I guess this is due to excessive swapping from > disk > >to memory. My vmstat shows (on one of the TTs) > > > >procs -----------memory---------- ---swap-- -----io---- -system-- > >----cpu---- > >r b swpd free buff cache si so bi bo in cs us sy id > >wa > >1 0 1860 34884 189948 1997644 0 0 2 1 0 1 0 0 > 100 > >0 > > > >My related config parms are pasted below. (I turned off speculative > >execution for both maps and reduces). Can anyone suggest me > >some improvements so as to make my reduce a bit faster? > >(I've allotted 900MB to task and reduced other params. Even then it is not > >showing any improvments.) . Any suggestions? > > > >======================================== > > > ><property> > ><name>mapred.min.split.size</name> > ><value>65536</value> > ></property> > > > > <property> > > <name>mapred.reduce.copy.backoff</name> > > <value>5</value> > > </property> > > > > > > <property> > > <name>io.sort.factor</name> > > <value>60</value> > > </property> > > > > <property> > > <name>mapred.reduce.parallel.copies</name> > > <value>25</value> > > </property> > > > > <property> > > <name>io.sort.mb</name> > > <value>70</value> > > </property> > > > ><property> > > <name>io.file.buffer.size</name> > > <value>32768</value> > > </property> > > > ><property> > > <name>mapred.child.java.opts</name> > > <value>-Xmx900m</value> > > </property> > > > >=================================== > > > > > > >