Chernishev Aleksandr created HDFS-10992: -------------------------------------------
Summary: file is under construction but no leases found Key: HDFS-10992 URL: https://issues.apache.org/jira/browse/HDFS-10992 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.1 Environment: hortonworks 2.3 build 2557. 10 Datanodes , 2 NameNode in auto failover Reporter: Chernishev Aleksandr On hdfs after recording a small number of files (at least 1000) the size (150Mb - 1,6Gb) found 13 damaged files with incomplete last block. hadoop fsck /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv -openforwrite -files -blocks -locations DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 Connecting to namenode via http://hadoop-m1:50070/fsck?ugi=hdfs&openforwrite=1&files=1&blocks=1&locations=1&path=%2Fstaging%2Flanding%2Fstream%2Fitc_dwh%2F811-ITF-ZO-P-bad%2Fload_tarifer-zf-4_20160902165521521.csv FSCK started by hdfs (auth:SIMPLE) from /10.42.12.178 for path /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv at Mon Oct 10 17:12:25 MSK 2016 /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv 920596121 bytes, 7 block(s), OPENFORWRITE: MISSING 1 blocks of total size 115289753 B 0. BP-1552885336-10.42.12.178-1446159880991:blk_1084952841_17798971 len=134217728 repl=4 [DatanodeInfoWithStorage[10.42.12.188:50010,DS-9ba44a76-113a-43ac-87dc-46aa97ba3267,DISK], DatanodeInfoWithStorage[10.42.12.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.42.12.184:50010,DS-ec462491-6766-490a-a92f-38e9bb3be5ce,DISK], DatanodeInfoWithStorage[10.42.12.182:50010,DS-cef46399-bb70-4f1a-ac55-d71c7e820c29,DISK]] 1. BP-1552885336-10.42.12.178-1446159880991:blk_1084952850_17799207 len=134217728 repl=3 [DatanodeInfoWithStorage[10.42.12.184:50010,DS-412769e0-0ec2-48d3-b644-b08a516b1c2c,DISK], DatanodeInfoWithStorage[10.42.12.181:50010,DS-97388b2f-c542-417d-ab06-c8d81b94fa9d,DISK], DatanodeInfoWithStorage[10.42.12.187:50010,DS-e7a11951-4315-4425-a88b-a9f6429cc058,DISK]] 2. BP-1552885336-10.42.12.178-1446159880991:blk_1084952857_17799489 len=134217728 repl=3 [DatanodeInfoWithStorage[10.42.12.184:50010,DS-7a08c597-b0f4-46eb-9916-f028efac66d7,DISK], DatanodeInfoWithStorage[10.42.12.180:50010,DS-fa6a4630-1626-43d8-9988-955a86ac3736,DISK], DatanodeInfoWithStorage[10.42.12.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]] 3. BP-1552885336-10.42.12.178-1446159880991:blk_1084952866_17799725 len=134217728 repl=3 [DatanodeInfoWithStorage[10.42.12.185:50010,DS-b5ff8ba0-275e-4846-b5a4-deda35aa0ad8,DISK], DatanodeInfoWithStorage[10.42.12.180:50010,DS-9cb6cade-9395-4f3a-ab7b-7fabd400b7f2,DISK], DatanodeInfoWithStorage[10.42.12.183:50010,DS-e277dcf3-1bce-4efd-a668-cd6fb2e10588,DISK]] 4. BP-1552885336-10.42.12.178-1446159880991:blk_1084952872_17799891 len=134217728 repl=4 [DatanodeInfoWithStorage[10.42.12.184:50010,DS-e1d8f278-1a22-4294-ac7e-e12d554aef7f,DISK], DatanodeInfoWithStorage[10.42.12.186:50010,DS-5d9aeb2b-e677-41cd-844e-4b36b3c84092,DISK], DatanodeInfoWithStorage[10.42.12.183:50010,DS-eccd375a-ea32-491b-a4a3-5ea3faca4171,DISK], DatanodeInfoWithStorage[10.42.12.182:50010,DS-8670e77d-c4db-4323-bb01-e0e64bd5b78e,DISK]] 5. BP-1552885336-10.42.12.178-1446159880991:blk_1084952880_17800120 len=134217728 repl=3 [DatanodeInfoWithStorage[10.42.12.181:50010,DS-79185b75-1938-4c91-a6d0-bb6687ca7e56,DISK], DatanodeInfoWithStorage[10.42.12.184:50010,DS-dcbd20aa-0334-49e0-b807-d2489f5923c6,DISK], DatanodeInfoWithStorage[10.42.12.183:50010,DS-f1d77328-f3af-483e-82e9-66ab0723a52c,DISK]] 6. BP-1552885336-10.42.12.178-1446159880991:blk_1084952887_17800316{UCState=COMMITTED, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-5f3eac72-eb55-4df7-bcaa-a6fa35c166a0:NORMAL:10.42.12.188:50010|RBW], ReplicaUC[[DISK]DS-a2a0d8f0-772e-419f-b4ff-10b4966c57ca:NORMAL:10.42.12.184:50010|RBW], ReplicaUC[[DISK]DS-52984aa0-598e-4fff-acfa-8904ca7b585c:NORMAL:10.42.12.185:50010|RBW]]} len=115289753 MISSING! Status: CORRUPT Total size: 920596121 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 7 (avg. block size 131513731 B) ******************************** UNDER MIN REPL'D BLOCKS: 1 (14.285714 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 1 MISSING BLOCKS: 1 MISSING SIZE: 115289753 B ******************************** Minimally replicated blocks: 6 (85.71429 %) Over-replicated blocks: 2 (28.571428 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.857143 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 10 Number of racks: 1 FSCK ended at Mon Oct 10 17:12:25 MSK 2016 in 0 milliseconds The filesystem under path '/staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv' is CORRUPT File is UNDER_RECOVERY, NameNode think that last block in COMMITTED state, datanode think that block in RBW state. Recover not executed. The last block file and his meta exist's in 'rwb' directory: -rw-r--r-- 1 hdfs hdfs 115289753 Sep 2 16:56 /hadoop12/data/current/BP-1552885336-10.42.12.178-1446159880991/current/rbw/blk_1084952887 -rw-r--r-- 1 hdfs hdfs 900711 Sep 2 16:56 /hadoop12/data/current/BP-1552885336-10.42.12.178-1446159880991/current/rbw/blk_1084952887_17800316.meta Lease recover tool said: hdfs debug recoverLease -path /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8 recoverLease got exception: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to RECOVER_LEASE /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv for DFSClient_NONMAPREDUCE_-1462314354_1 on 10.42.12.178 because the file is under construction but no leases found. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2892) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2835) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:668) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:663) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy9.recoverLease(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.recoverLease(ClientNamenodeProtocolTranslatorPB.java:603) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy10.recoverLease(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:1259) at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:279) at org.apache.hadoop.hdfs.DistributedFileSystem$2.doCall(DistributedFileSystem.java:275) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:275) at org.apache.hadoop.hdfs.tools.DebugAdmin$RecoverLeaseCommand.run(DebugAdmin.java:256) at org.apache.hadoop.hdfs.tools.DebugAdmin.run(DebugAdmin.java:336) at org.apache.hadoop.hdfs.tools.DebugAdmin.main(DebugAdmin.java:359) Giving up on recoverLease for /staging/landing/stream/itc_dwh/811-ITF-ZO-P-bad/load_tarifer-zf-4_20160902165521521.csv after 1 try. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org