Hi Ivan, Sorry for taking long time to answer your email. I did the test as you asked and I found the commit below as the one that caused the breakage. I wish I could provide a fix, but I do not have time for today.
commit 27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3 Author: Hairong Kuang <hair...@apache.org> Date: Mon Apr 11 17:15:27 2011 +0000 HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang. git-svn-id: https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@109113113f79535-47bb-0310-9956-ffa450edef68 Regards, André Oriani On Thu, Jun 16, 2011 at 07:31, Ivan Kelly <iv...@yahoo-inc.com> wrote: > This seems to have been introduced here: > https://github.com/apache/**hadoop-hdfs/commit/** > 27b956fa62ce9b467ab7dd287dd6dc**d5ab6a0cb3#src/java/org/** > apache/hadoop/hdfs/server/**namenode/BackupImage.java<https://github.com/apache/hadoop-hdfs/commit/27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3#src/java/org/apache/hadoop/hdfs/server/namenode/BackupImage.java> > The backup streams never write the version, so it should never try to read > it either. I would have expected this to fail earlier as it's reading junk > since the stream pointer is a int past where it should be. BackupStreams > don't write the checksum either. This really should have failed the > BackupNode unit test, but I think there other problems with that. cf. > https://issues.apache.org/**jira/browse/HDFS-1521?** > focusedCommentId=13010242&**page=com.atlassian.jira.** > plugin.system.issuetabpanels:**comment-tabpanel#comment-**13010242<https://issues.apache.org/jira/browse/HDFS-1521?focusedCommentId=13010242&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13010242> > > Could you try again with code from April 10th. > > Another candidate for causing it could be HDFS-2003 which went in on the > 8th of this month. > > > > > > > On 16/06/2011 00:42, André Oriani wrote: > >> Hi, >> >> My repo is one week old and the change I did was to modify the >> Configuration object at BackupNode.initialize() to make the name and edit >> dirs to other directories, so I could run both namenode and backup node in >> the same machine. When I copied a file to HDFS, the follow exception was >> below was thrown. Have anyone seem that ? >> >> >> 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call >> journal(NamenodeRegistration(**localhost:8020, role=NameNode), 101, 164, >> [B@3951f910), rpc version=1, client version=5, >> methodsFingerPrint=302283637 >> from 192.168.1.102:56780: error: java.io.IOException: Error replaying >> edit >> log at offset 13 >> Recent opcode offsets: 1 >> java.io.IOException: Error replaying edit log at offset 13 >> Recent opcode offsets: 1 >> at >> org.apache.hadoop.hdfs.server.**namenode.FSEditLogLoader.** >> loadEditRecords(**FSEditLogLoader.java:514) >> at >> org.apache.hadoop.hdfs.server.**namenode.BackupImage.journal(** >> BackupImage.java:242) >> at >> org.apache.hadoop.hdfs.server.**namenode.BackupNode.journal(** >> BackupNode.java:251) >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method) >> at >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** >> NativeMethodAccessorImpl.java:**39) >> at >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** >> DelegatingMethodAccessorImpl.**java:25) >> at java.lang.reflect.Method.**invoke(Method.java:597) >> at >> org.apache.hadoop.ipc.**WritableRpcEngine$Server.call(** >> WritableRpcEngine.java:422) >> at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:**1496) >> at org.apache.hadoop.ipc.Server$**Handler$1.run(Server.java:**1492) >> at java.security.**AccessController.doPrivileged(**Native Method) >> at javax.security.auth.Subject.**doAs(Subject.java:396) >> at >> org.apache.hadoop.security.**UserGroupInformation.doAs(** >> UserGroupInformation.java:**1131) >> at org.apache.hadoop.ipc.Server$**Handler.run(Server.java:1490) >> Caused by: org.apache.hadoop.fs.**ChecksumException: Transaction 1 is >> corrupt. >> Calculated checksum is -2116249809 but read checksum 0 >> at >> org.apache.hadoop.hdfs.server.**namenode.FSEditLogLoader.** >> validateChecksum(**FSEditLogLoader.java:546) >> at >> org.apache.hadoop.hdfs.server.**namenode.FSEditLogLoader.** >> loadEditRecords(**FSEditLogLoader.java:490) >> ... 13 more >> >> >> Thanks and Regards, >> André Oriani >> > >