Thanks a lot Arun and Harsh.Yes the error was from one disk from one of the nodes. It worked when I ran without it.
On Thu, Aug 16, 2012 at 2:34 PM, Arun C Murthy <a...@hortonworks.com> wrote: > Also, do you have ECC RAM? > > On Aug 16, 2012, at 11:34 AM, Arun C Murthy wrote: > > > Primarily, it could be caused by a corrupt disk - which is why checking > if it's happening on a specific node(s) can help. > > > > Arun > > > > On Aug 16, 2012, at 10:04 AM, Pavan Kulkarni wrote: > > > >> Harsh, > >> > >> I see this on couple of nodes.But what may be the cause of this error > ?Any > >> idea about it? Thanks > >> > >> On Sun, Aug 12, 2012 at 9:06 AM, Harsh J <ha...@cloudera.com> wrote: > >> > >>> Hi Pavan, > >>> > >>> Do you see this happen on a specific node every time (i.e. when the > >>> reducer runs there)? > >>> > >>> On Fri, Aug 10, 2012 at 11:43 PM, Pavan Kulkarni > >>> <pavan.babu...@gmail.com> wrote: > >>>> Hi, > >>>> > >>>> I am running a Terasort with a cluster of 8 nodes.The map phase > >>> completes > >>>> but when the reduce phase is around 68-70% I get this following error. > >>>> > >>>> * > >>>> 12/08/10 11:02:36 INFO mapred.JobClient: Task Id : > >>>> attempt_201208101018_0001_r_000027_0, Status : FAILED > >>>> java.lang.RuntimeException: problem advancing post rec#38320220 > >>>> * > >>>> * at > >>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)* > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249) > >>>> * > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245) > >>>> * > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40) > >>>> * > >>>> * at > >>>> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)* > >>>> * at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)* > >>>> * at org.apache.hadoop.mapred.Child$4.run(Child.java:255)* > >>>> * at java.security.AccessController.doPrivileged(Native > Method)* > >>>> * at javax.security.auth.Subject.doAs(Subject.java:416)* > >>>> * at > >>>> > >>> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) > >>>> * > >>>> * at org.apache.hadoop.mapred.Child.main(Child.java:249)* > >>>> *Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error* > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)* > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)* > >>>> * at > >>> org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)* > >>>> * at > >>> org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)* > >>>> * at > >>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)* > >>>> * at > org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:374)* > >>>> * at > >>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)* > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330) > >>>> * > >>>> * at > >>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > >>>> * > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$RawKVIteratorReader.next(ReduceTask.java:2531) > >>>> * > >>>> * at > >>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)* > >>>> * at > >>>> > >>> > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330) > >>>> * > >>>> * at > >>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > >>>> * > >>>> * at > >>>> > org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1253)* > >>>> * at > >>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1212)* > >>>> * ... 10 more* > >>>> > >>>> I came across somone facing the same > >>>> issue< > >>> > http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%3c1c802db51001280427j5b8e57dai4a8d0fdd038...@mail.gmail.com%3E > >>>> in > >>>> the mail-archives and he seemed to resolve it by listing hostnames in > >>>> the */etc/hosts *file, > >>>> but all my nodes have correct info about the hostnames in /etc/hosts, > >>> but I > >>>> still have these reducers throwing error. > >>>> Any help regarding this issue is appreciated .Thanks > >>>> > >>>> -- > >>>> > >>>> --With Regards > >>>> Pavan Kulkarni > >>> > >>> > >>> > >>> -- > >>> Harsh J > >>> > >> > >> > >> > >> -- > >> > >> --With Regards > >> Pavan Kulkarni > > > > -- > > Arun C. Murthy > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > > -- > Arun C. Murthy > Hortonworks Inc. > http://hortonworks.com/ > > > -- --With Regards Pavan Kulkarni