Hi Jinsong, I don't see any data loss here.
The sequence of events from the logs: ==> NN allocates block: 2010-05-18 21:21:29,731 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /hbase/.META./1028785192/info/656097411976846533. blk_5636039758999247483_31304886 ===> First DN reports it has received block 2010-05-18 21:21:29,913 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.110.8.63:50010 is added to blk_5636039758999247483_31304886 size 441 ===> Client calls completeFile 2010-05-18 21:21:29,913 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /hbase/.META./1028785192/info/656097411976846533 is closed by DFSClient_-919320526 ===> 2nd and 3rd DN have not yet heartbeated since receiving the block, so replication count is low, and unnecessary replication is scheduled. This is a known issue - I was actually meaning to file a JIRA about it this week. 2010-05-18 21:21:31,987 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.110.8.63:50010 to replicate blk_5636039758999247483_31304886 to datanode(s) 10.110.8.86:50010 10.110.8.69:50010 ===> Other DNs check in (within 4-5 seconds of file completion, which is reasonable heartbeat time) 2010-05-18 21:21:33,156 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.110.8.69:50010 is added to blk_5636039758999247483_31304886 size 441 2010-05-18 21:21:34,413 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.110.8.59:50010 is added to blk_5636039758999247483_31304886 size 441 ===> 8 seconds later the first replication goes through and cleanup of excess replicas happens 2010-05-18 21:21:41,941 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.110.8.86:50010 is added to blk_5636039758999247483_31304886 size 441 2010-05-18 21:21:41,941 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.chooseExcessReplicates: (10.110.8.63:50010, blk_5636039758999247483_31304886) is added to recentInvalidateSets 2010-05-18 21:21:43,995 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.110.8.63:50010 to delete blk_5636039758999247483_31304886 blk_4349310048904429157_31304519 ===> another 14 seconds later, the other replication goes through and another excess is invalidated 2010-05-18 21:21:57,835 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.110.8.85:50010 is added to blk_5636039758999247483_31304886 size 441 2010-05-18 21:21:57,835 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.chooseExcessReplicates: (10.110.8.69:50010, blk_5636039758999247483_31304886) is added to recentInvalidateSets ===> about 5 minutes later, the regionserver performs a compaction and asks the NN to delete this file 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_5636039758999247483 is added to invalidSet of 10.110.8.63:50010 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_5636039758999247483 is added to invalidSet of 10.110.8.69:50010 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_5636039758999247483 is added to invalidSet of 10.110.8.59:50010 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_5636039758999247483 is added to invalidSet of 10.110.8.86:50010 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_5636039758999247483 is added to invalidSet of 10.110.8.85:50010 2010-05-18 21:26:39,389 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop ip=/10.110.8.85 cmd=delete src=/hbase/.META./1028785192/info/656097411976846533 dst=null perm=null As for the errors seen in the regionserver, the issue is that it called open() before replication was done, and therefore only got one block location. When the replica was removed there, the RS should have gone back to the NN for more replicas. I'm not sure if the codebase you're running includes HDFS-445. Judging from the CHANGES.txt, it appears not. So, it's likely that you're hitting this bug where DFSClient wouldn't refetch block locations from the NN for positional reads. Thanks -Todd On Fri, May 21, 2010 at 3:32 PM, Jinsong Hu <jinsong...@hotmail.com> wrote: > Hi, Todd: > I have cut all the logs for that 10 minutes from the 3 machines, and if > you grep 5636039758999247483, you will see > all the logs from the regionserver, namenode, and data node. You can > compare > the time sequence. > all our machine's time is sync'ed via ntp, so you can compare exactly side > by side and show the > sequence of events. > The build I took is from yahoo 0.20.9 distribution. > > Jinsong > > -------------------------------------------------- > From: "Todd Lipcon" <t...@cloudera.com> > Sent: Friday, May 21, 2010 2:20 PM > To: <hdfs-dev@hadoop.apache.org> > Subject: Re: Fw: hadoop data loss issue discovered. please fix! > > > Hi Jinsong, >> >> Could you upload a tarball of the log files somewhere from each of the DNs >> and the RS involved? It's hard to trace through the log in the email (the >> email added all kinds of wrapping, etc) >> >> -Todd >> >> On Fri, May 21, 2010 at 2:17 PM, Jinsong Hu <jinsong...@hotmail.com> >> wrote: >> >> Hi, There: >>> While I used hadoop 0.20.9-yahoo distribution and hbase 0.20.4 version, >>> I >>> found that the hadoop lose blocks under certain >>> situation, and thus corrupt hbase tables. >>> >>> >>> >>> I compared namenode, datanode and hbase regionserver and figured out the >>> reason. >>> >>> >>> >>> The regionserver 10.110.8.85 asks namenode 10.110.8.83 to save a block, >>> 10.110.8.84 gives >>> >>> Multiple IP, regionserver choose 10.110.8.63 and save the block there. >>> After a while, namenode >>> >>> Asks the bock to be replicated to 10.110.8..86 and 10.110.8..69 >>> machines. >>> A moment late, .86. .69 received >>> >>> The replication, but strangely, 10.110.8..59 10.110.8..85 also received >>> replicaton of the block., even though it >>> >>> Is not in the replication list. >>> >>> >>> >>> Then the chooseExcessReplicates asks to delele excess from .63, .69 , >>> thinking there are too >>> >>> Many replica. Even though .63 was the original copy, the algorithm >>> choose >>> to delete block based >>> >>> On the amount of empty disk. A moment later addToInvalidates ( not from >>> chooseExcessReplicates ) asks >>> >>> The block to be deleted on .86, .85, .59. I check the code, this can >>> only >>> be possible if >>> >>> The block is corrupted. >>> >>> >>> >>> In the end, this block doesn't exist anywhere in the cluster. And it is >>> permanently lost. >>> >>> >>> >>> >>> >>> namenode: >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:41,941 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.chooseExcessReplicates: (10.110.8.63:50010, >>> blk_5636039758999247483_31304886) >>> >>> is added to recentInvalidateSets >>> >>> 2010-05-18 21:21:43,995 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.63:50010 to delete blk_5636039758999247483_31304886 >>> blk_434931004890442915 >>> >>> 7_31304519 >>> >>> >>> >>> >>> >>> >>> >>> the block was initially added to 10.110.8.63, then replicated to >>> >>> 10.110.8.63 59 69 86 85 . subsequently, replication process >>> addToInvalidates >>> >>> removed all of them. the code review shows that the replicate is corrupt, >>> >>> and all get deleted. >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:29,913 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addStoredBlock: blockMap updated: 10.110.8.63:50010 is added to >>> blk_563603975 >>> >>> 8999247483_31304886 size 441 >>> >>> 2010-05-18 21:21:31,987 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.63:50010 to replicate blk_5636039758999247483_31304886 to >>> datanode(s) >>> 10.11 >>> >>> 0.8.86:50010 10.110.8.69:50010 >>> >>> 2010-05-18 21:21:41,941 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.chooseExcessReplicates: (10.110.8.63:50010, >>> blk_5636039758999247483_31304886) >>> >>> is added to recentInvalidateSets >>> >>> 2010-05-18 21:21:43,995 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.63:50010 to delete blk_5636039758999247483_31304886 >>> blk_434931004890442915 >>> >>> 7_31304519 >>> >>> 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addToInvalidates: blk_5636039758999247483 is added to invalidSet of >>> 10.110.8. >>> >>> 63:50010 >>> >>> 2010-05-18 21:26:45,953 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.63:50010 to delete blk_-1838286221287242082_31305179 >>> blk_84467625641825134 >>> >>> 17_31305143 blk_5636039758999247483_31304886 >>> blk_4628640249731313760_31305046 bl >>> >>> k_7460947863067370701_31270225 blk_-4468681536500281247_31270225 >>> blk_84535177111 >>> >>> 01429609_31303917 blk_9126133835045521966_31303972 >>> blk_4623110280826973929_31305 >>> >>> 203 blk_-2581238696314957800_31305033 blk_7461125351290749755_31305052 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:31,987 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.63:50010 to replicate blk_5636039758999247483_31304886 to >>> datanode(s) >>> 10.11 >>> >>> 0.8.86:50010 10.110.8.69:50010 >>> >>> 2010-05-18 21:21:33,156 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addStoredBlock: blockMap updated: 10.110.8.69:50010 is added to >>> blk_563603975 >>> >>> 8999247483_31304886 size 441 >>> >>> 2010-05-18 21:21:57,835 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.chooseExcessReplicates: (10.110.8.69:50010, >>> blk_5636039758999247483_31304886) >>> >>> is added to recentInvalidateSets >>> >>> 2010-05-18 21:21:59,005 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.69:50010 to delete blk_5636039758999247483_31304886 >>> >>> 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addToInvalidates: blk_5636039758999247483 is added to invalidSet of >>> 10.110.8. >>> >>> 69:50010 >>> >>> 2010-05-18 21:26:42,951 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.69:50010 to delete blk_-2124965527858346013_31270213 >>> blk_-5027506345849158 >>> >>> 498_31270213 blk_5636039758999247483_31304886 >>> blk_9148821113904458973_31305189 b >>> >>> lk_4850797749721229572_31305072 blk_252039065084461924_31305031 >>> blk_-83518367280 >>> >>> 09062091_31305208 blk_-7576696059515014894_31305194 >>> blk_-2900250119736465962_312 >>> >>> 70214 blk_471700613578524871_31304950 blk_-190744003190006044_31305064 >>> blk_72650 >>> >>> 57386742001625_31305073 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:31,987 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10. >>> >>> 10.8.63:50010 to replicate blk_5636039758999247483_31304886 to >>> datanode(s) >>> 10.1 >>> >>> 0.8.86:50010 10.110.8.69:50010 >>> >>> 2010-05-18 21:21:41,941 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSys >>> >>> em.addStoredBlock: blockMap updated: 10.110.8.86:50010 is added to >>> blk_56360397 >>> >>> 8999247483_31304886 size 441 >>> >>> 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSys >>> >>> em.addToInvalidates: blk_5636039758999247483 is added to invalidSet of >>> 10.110.8 >>> >>> 86:50010 >>> >>> 2010-05-18 21:26:42,951 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10. >>> >>> 10.8.86:50010 to delete blk_-6242136662924452584_31259201 >>> blk_5636039758999247 >>> >>> 83_31304886 blk_4850797749721229572_31305072 >>> blk_252039065084461924_31305031 >>> bl >>> >>> _-1317144678443645904_31305204 blk_6050185755706975664_31270230 >>> blk_26714169718 >>> >>> 5801868_31304948 blk_-5582352089328547938_31305022 >>> blk_-3115115738671914626_312 >>> >>> 0210 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:34,413 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addStoredBlock: blockMap updated: 10.110.8.59:50010 is added to >>> blk_563603975 >>> >>> 8999247483_31304886 size 441 >>> >>> 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addToInvalidates: blk_5636039758999247483 is added to invalidSet of >>> 10.110.8. >>> >>> 59:50010 >>> >>> 2010-05-18 21:26:39,950 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.59:50010 to delete blk_5636039758999247483_31304886 >>> blk_-45285121566353996 >>> >>> 25_31305212 blk_1439789418382469336_31305158 >>> blk_8860574934531794641_31270219 bl >>> >>> k_-8358193301564392132_31305029 >>> >>> >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:57,835 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addStoredBlock: blockMap updated: 10.110.8.85:50010 is added to >>> blk_563603975 >>> >>> 8999247483_31304886 size 441 >>> >>> 2010-05-18 21:26:39,388 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> NameSyst >>> >>> em.addToInvalidates: blk_5636039758999247483 is added to invalidSet of >>> 10.110.8. >>> >>> 85:50010 >>> >>> 2010-05-18 21:26:39,950 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* >>> ask >>> 10.1 >>> >>> 10.8.85:50010 to delete blk_-6242136662924452584_31259201 >>> blk_56360397589992474 >>> >>> 83_31304886 blk_4628640249731313760_31305046 >>> blk_4747588241975451642_31305123 bl >>> >>> k_-6876078628884993825_31270230 blk_-4468681536500281247_31270225 >>> blk_7325830193 >>> >>> 509411302_31270230 blk_8453517711101429609_31303917 >>> blk_-6094734447689285387_313 >>> >>> 05127 blk_3353439739797003235_31305037 blk_-5027506345849158498_31270213 >>> blk_148 >>> >>> 4161645992497144_31270225 blk_4464987648045469454_31305144 >>> blk_74609478630673707 >>> >>> 01_31270225 blk_-1170815606945644545_31270230 >>> blk_6050185755706975664_31270230 b >>> >>> lk_-8358193301564392132_31305029 blk_2671416971885801868_31304948 >>> blk_5593547375 >>> >>> 459437465_31286511 blk_-2581238696314957800_31305033 >>> blk_4732635559915402193_312 >>> >>> 70230 blk_-2124965527858346013_31270213 blk_-5837992573431863412_31286612 >>> blk_-4 >>> >>> 32558447034944954_31270208 blk_-3407615138527189735_31305069 >>> blk_886057493453179 >>> >>> 4641_31270219 blk_233110856487529716_31270229 >>> blk_312750273180273303_31270228 bl >>> >>> k_7461125351290749755_31305052 blk_-8902661185532055148_31304947 >>> blk_-8555258258 >>> >>> 738129670_31270210 blk_252039065084461924_31305031 >>> blk_9037118763503479133_31305 >>> >>> 120 blk_-8494656323754369174_31305105 blk_9126133835045521966_31303972 >>> blk_-5582 >>> >>> 352089328547938_31305022 blk_-2900250119736465962_31270214 >>> blk_-3115115738671914 >>> >>> 626_31270210 blk_7612090442234634555_31270225 >>> blk_5876492007747505188_31270213 b >>> >>> lk_471700613578524871_31304950 blk_-190744003190006044_31305064 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> datanode 10.110.8.63: >>> >>> >>> >>> 2010-05-18 21:21:46,058 INFO >>> org.apache.hadoop.hdfs.server.datanode.DataNode: De >>> >>> leting block blk_5636039758999247483_31304886 file >>> /hadoop_data_dir/dfs/data/cur >>> >>> rent/subdir23/blk_5636039758999247483 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> hbase region server 10.110.8.85: >>> >>> >>> >>> DFSClient.java, >>> >>> DatanodeInfo chosenNode = bestNode(nodes, deadNodes); >>> >>> InetSocketAddress targetAddr = >>> NetUtils.createSocketAddr(chosenNode.getName()); >>> >>> return new DNAddrPair(chosenNode, targetAddr); >>> >>> >>> >>> still picked the 10.110.8.63 even though the command is sent from name >>> node >>> 21:21:43,995 >>> >>> to delete the block, and it was executed at 21:21:46,058 . ? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> 2010-05-18 21:21:46,188 WARN org.apache.hadoop.hdfs.DFSClient: Failed to >>> connect >>> >>> to /10.110.8.63:50010 for file >>> /hbase/.META./1028785192/info/656097411976846533 >>> >>> for block 5636039758999247483:java.io.IOException: Got error in response >>> to >>> OP_ >>> >>> READ_BLOCK for file /hbase/.META./1028785192/info/656097411976846533 for >>> block 5 >>> >>> 636039758999247483 >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > -- Todd Lipcon Software Engineer, Cloudera