Re: Checksum Error during Reduce Phase hadoop-1.0.2

Pavan Kulkarni Thu, 16 Aug 2012 21:16:08 -0700

Thanks a lot Arun and Harsh.Yes the error was from one disk from one of the
nodes. It worked when I ran without it.


On Thu, Aug 16, 2012 at 2:34 PM, Arun C Murthy <a...@hortonworks.com> wrote:

> Also, do you have ECC RAM?
>
> On Aug 16, 2012, at 11:34 AM, Arun C Murthy wrote:
>
> > Primarily, it could be caused by a corrupt disk - which is why checking
> if it's happening on a specific node(s) can help.
> >
> > Arun
> >
> > On Aug 16, 2012, at 10:04 AM, Pavan Kulkarni wrote:
> >
> >> Harsh,
> >>
> >> I see this on couple of nodes.But what may be the cause of this error
> ?Any
> >> idea about it? Thanks
> >>
> >> On Sun, Aug 12, 2012 at 9:06 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >>> Hi Pavan,
> >>>
> >>> Do you see this happen on a specific node every time (i.e. when the
> >>> reducer runs there)?
> >>>
> >>> On Fri, Aug 10, 2012 at 11:43 PM, Pavan Kulkarni
> >>> <pavan.babu...@gmail.com> wrote:
> >>>> Hi,
> >>>>
> >>>> I am running a Terasort with a cluster of 8 nodes.The map phase
> >>> completes
> >>>> but when the reduce phase is around 68-70% I get this following error.
> >>>>
> >>>> *
> >>>> 12/08/10 11:02:36 INFO mapred.JobClient: Task Id :
> >>>> attempt_201208101018_0001_r_000027_0, Status : FAILED
> >>>> java.lang.RuntimeException: problem advancing post rec#38320220
> >>>> *
> >>>> *        at
> >>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1214)*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:249)
> >>>> *
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:245)
> >>>> *
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:40)
> >>>> *
> >>>> *        at
> >>>>
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)*
> >>>> *        at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)*
> >>>> *        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)*
> >>>> *        at java.security.AccessController.doPrivileged(Native
> Method)*
> >>>> *        at javax.security.auth.Subject.doAs(Subject.java:416)*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
> >>>> *
> >>>> *        at org.apache.hadoop.mapred.Child.main(Child.java:249)*
> >>>> *Caused by: org.apache.hadoop.fs.ChecksumException: Checksum Error*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:164)*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)*
> >>>> *        at
> >>> org.apache.hadoop.mapred.IFile$Reader.readData(IFile.java:328)*
> >>>> *        at
> >>> org.apache.hadoop.mapred.IFile$Reader.rejigData(IFile.java:358)*
> >>>> *        at
> >>>> org.apache.hadoop.mapred.IFile$Reader.readNextBlock(IFile.java:342)*
> >>>> *        at
> org.apache.hadoop.mapred.IFile$Reader.next(IFile.java:374)*
> >>>> *        at
> >>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
> >>>> *
> >>>> *        at
> >>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> >>>> *
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$RawKVIteratorReader.next(ReduceTask.java:2531)
> >>>> *
> >>>> *        at
> >>> org.apache.hadoop.mapred.Merger$Segment.next(Merger.java:220)*
> >>>> *        at
> >>>>
> >>>
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:330)
> >>>> *
> >>>> *        at
> >>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> >>>> *
> >>>> *        at
> >>>>
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextKey(Task.java:1253)*
> >>>> *        at
> >>>> org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1212)*
> >>>> *        ... 10 more*
> >>>>
> >>>> I came across somone facing the same
> >>>> issue<
> >>>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201001.mbox/%3c1c802db51001280427j5b8e57dai4a8d0fdd038...@mail.gmail.com%3E
> >>>> in
> >>>> the mail-archives and he seemed to resolve it by listing hostnames in
> >>>> the */etc/hosts *file,
> >>>> but all my nodes have correct info about the hostnames in /etc/hosts,
> >>> but I
> >>>> still have these reducers throwing error.
> >>>> Any help regarding this issue is appreciated .Thanks
> >>>>
> >>>> --
> >>>>
> >>>> --With Regards
> >>>> Pavan Kulkarni
> >>>
> >>>
> >>>
> >>> --
> >>> Harsh J
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> --With Regards
> >> Pavan Kulkarni
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


-- 

--With Regards
Pavan Kulkarni

Re: Checksum Error during Reduce Phase hadoop-1.0.2

Reply via email to