[ https://issues.apache.org/jira/browse/HDFS-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suresh Srinivas resolved HDFS-1970. ----------------------------------- Resolution: Duplicate Duplicate of HDFS-1951 as indicated in previous comment. > Null pointer exception comes when Namenode recovery happens and there is no > response from client to NN more than the hardlimit for NN recovery and the > current block is more than the prev block size in NN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HDFS-1970 > URL: https://issues.apache.org/jira/browse/HDFS-1970 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20-append > Reporter: ramkrishna.s.vasudevan > Fix For: 0.20-append > > > Null pointer exception comes when Namenode recovery happens and there is no > response from client to NN more than the hardlimit for NN recovery and the > current block is more than the prev block size in NN > 1. Write using a client to 2 datanodes > 2. Kill one data node and allow pipeline recovery. > 3. write somemore data to the same block > 4. Parallely allow the namenode recovery to happen > Null pointer exception will come in addStoreBlock api. > Pls find the logs > Debugging in MachineName.. > Listening for transport dt_socket at address: 8007 > 11/05/20 21:38:33 INFO namenode.NameNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting NameNode > STARTUP_MSG: host = linux76/10.18.52.76 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.20.3-SNAPSHOT > STARTUP_MSG: build = -r ; compiled by 'G00900856' on Tue Feb 1 11:40:14 > IST 2011 > ************************************************************/ > 11/05/20 21:38:33 INFO metrics.RpcMetrics: Initializing RPC Metrics with > hostName=NameNode, port=9000 > 11/05/20 21:38:33 INFO namenode.NameNode: Namenode up at: > linux76/10.18.52.76:9000 > 11/05/20 21:38:33 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=NameNode, sessionId=null > 11/05/20 21:38:33 INFO metrics.NameNodeMetrics: Initializing NameNodeMeterics > using context object:org.apache.hadoop.metrics.spi.NullContext > 11/05/20 21:38:33 INFO namenode.FSNamesystem: fsOwner=root,root > 11/05/20 21:38:33 INFO namenode.FSNamesystem: supergroup=supergroup > 11/05/20 21:38:33 INFO namenode.FSNamesystem: isPermissionEnabled=false > 11/05/20 21:38:33 INFO metrics.FSNamesystemMetrics: Initializing > FSNamesystemMetrics using context > object:org.apache.hadoop.metrics.spi.NullContext > 11/05/20 21:38:33 INFO namenode.FSNamesystem: Registered > FSNamesystemStatusMBean > 11/05/20 21:38:33 INFO common.Storage: Number of files = 1 > 11/05/20 21:38:33 INFO common.Storage: Number of files under construction = 0 > 11/05/20 21:38:33 INFO common.Storage: Image file of size 94 loaded in 0 > seconds. > 11/05/20 21:38:33 INFO common.Storage: Edits file > /home/ramkrishna/opensrchadoop/appendbranch/hadoop-0.20.3-SNAPSHOT/bin/../hadoop-root/dfs/name/current/edits > of size 4 edits # 0 loaded in 0 seconds. > 11/05/20 21:38:33 INFO common.Storage: Image file of size 94 saved in 0 > seconds. > 11/05/20 21:38:34 INFO common.Storage: Image file of size 94 saved in 0 > seconds. > 11/05/20 21:38:34 INFO namenode.FSNamesystem: Finished loading FSImage in 482 > msecs > 11/05/20 21:38:34 INFO namenode.FSNamesystem: Total number of blocks = 0 > 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of invalid blocks = 0 > 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of under-replicated > blocks = 0 > 11/05/20 21:38:34 INFO namenode.FSNamesystem: Number of over-replicated > blocks = 0 > 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Leaving safe mode after 0 > secs. > 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* Network topology has 0 racks > and 0 datanodes > 11/05/20 21:38:34 INFO hdfs.StateChange: STATE* UnderReplicatedBlocks has 0 > blocks > 11/05/20 21:38:34 INFO mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 11/05/20 21:38:35 INFO http.HttpServer: Port returned by > webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the > listener on 50070 > 11/05/20 21:38:35 INFO http.HttpServer: listener.getLocalPort() returned > 50070 webServer.getConnectors()[0].getLocalPort() returned 50070 > 11/05/20 21:38:35 INFO http.HttpServer: Jetty bound to port 50070 > 11/05/20 21:38:35 INFO mortbay.log: jetty-6.1.14 > 11/05/20 21:38:37 INFO mortbay.log: Started > SelectChannelConnector@linux76:50070 > 11/05/20 21:38:37 INFO namenode.NameNode: Web-server up at: linux76:50070 > 11/05/20 21:38:37 INFO ipc.Server: IPC Server Responder: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server listener on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 0 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 1 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 2 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 3 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 4 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 5 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 6 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 7 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 8 on 9000: starting > 11/05/20 21:38:37 INFO ipc.Server: IPC Server handler 9 on 9000: starting > 11/05/20 21:38:38 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode: > node registration from 10.18.52.76:50010 storage > DS-1868335495-10.18.52.76-50010-1305907718956 > 11/05/20 21:38:38 INFO net.NetworkTopology: Adding a new node: > /default-rack/10.18.52.76:50010 > 11/05/20 21:38:41 INFO hdfs.StateChange: BLOCK* NameSystem.registerDatanode: > node registration from 10.18.52.5:50010 storage > DS-721814011-10.18.52.5-50010-1305928120116 > 11/05/20 21:38:41 INFO net.NetworkTopology: Adding a new node: > /default-rack/10.18.52.5:50010 > 11/05/20 21:38:54 INFO FSNamesystem.audit: ugi=R00902313,Tardis > ip=/10.18.47.133 cmd=create src=/synctest0 dst=null > perm=R00902313:supergroup:rw-r--r-- > 11/05/20 21:38:54 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: > /synctest0. blk_-8455334393673385140_1001 > 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > blockMap updated: 10.18.52.5:50010 is added to blk_-8455334393673385140_1001 > size 1024 > 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > blockMap updated: 10.18.52.76:50010 is added to blk_-8455334393673385140_1001 > size 1024 > 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: > /synctest0. blk_8955752156241965528_1001 > 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > blockMap updated: 10.18.52.76:50010 is added to blk_8955752156241965528_1001 > size 1024 > 11/05/20 21:38:55 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > blockMap updated: 10.18.52.5:50010 is added to blk_8955752156241965528_1001 > size 1024 > 11/05/20 21:39:11 INFO hdfs.StateChange: BLOCK* NameSystem.allocateBlock: > /synctest0. blk_-5249191180607448701_1001 > 11/05/20 21:40:18 INFO namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_-5249191180607448701_1001, > newgenerationstamp=1002, newlength=833, newtargets=[10.18.52.76:50010], > closeFile=false, deleteBlock=false) > 11/05/20 21:40:18 INFO namenode.FSNamesystem: Number of transactions: 4 Total > time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number > of syncs: 1 SyncTimes(ms): 28 > 11/05/20 21:40:18 INFO namenode.FSNamesystem: > commitBlockSynchronization(blk_-5249191180607448701_1002) successful > 11/05/20 21:41:41 INFO namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_-676883980, pendingcreates: 1] has expired hard limit > 11/05/20 21:41:56 INFO namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_-676883980, pendingcreates: 1], src=/synctest0 > 11/05/20 21:41:56 INFO hdfs.StateChange: BLOCK* blk_-5249191180607448701_1002 > recovery started, primary=10.18.52.76:50010 > 11/05/20 21:42:09 INFO namenode.FSNamesystem: > commitBlockSynchronization(lastblock=blk_-5249191180607448701_1002, > newgenerationstamp=1003, newlength=867, newtargets=[10.18.52.76:50010], > closeFile=true, deleteBlock=false) > 11/05/20 21:42:26 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > addStoredBlock request received for blk_-5249191180607448701_1003 on > 10.18.52.76:50010 size 867 and it belongs to a file under construction. > 11/05/20 21:42:52 WARN namenode.FSNamesystem: Inconsistent size for block > blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 current size is > 833 reported size is 867 > 11/05/20 21:43:11 WARN namenode.FSNamesystem: Block > blk_-5249191180607448701_1003 reported from 10.18.52.76:50010 does not exist > in blockMap. Surprise! Surprise! > 11/05/20 21:43:13 INFO hdfs.StateChange: Removing lease on file /synctest0 > from client NN_Recovery > 11/05/20 21:43:13 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to > replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010 > 11/05/20 21:43:13 INFO namenode.FSNamesystem: Number of transactions: 6 Total > time for transactions(ms): 2Number of transactions batched in Syncs: 0 Number > of syncs: 2 SyncTimes(ms): 30 > 11/05/20 21:43:13 INFO namenode.FSNamesystem: > commitBlockSynchronization(newblock=blk_-5249191180607448701_1003, > file=/synctest0, newgenerationstamp=1003, newlength=867, > newtargets=[10.18.52.76:50010]) successful > 11/05/20 21:43:15 INFO ipc.Server: IPC Server handler 6 on 9000, call > blockReceived(DatanodeRegistration(10.18.52.76:50010, > storageID=DS-1868335495-10.18.52.76-50010-1305907718956, infoPort=50075, > ipcPort=50020), [Lorg.apache.hadoop.hdfs.protocol.Block;@9b774e, > [Ljava.lang.String;@b5d05b) from 10.18.52.76:40343: error: > java.io.IOException: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addStoredBlock(FSNamesystem.java:3173) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.blockReceived(FSNamesystem.java:3592) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.blockReceived(NameNode.java:756) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955) > 11/05/20 21:43:19 WARN hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: > Redundant addStoredBlock request received for blk_-5249191180607448701_1003 > on 10.18.52.76:50010 size 867 > 11/05/20 21:43:19 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to > replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010 > Listening for transport dt_socket at address: 8007 > 11/05/20 21:43:22 INFO hdfs.StateChange: BLOCK* ask 10.18.52.76:50010 to > replicate blk_-5249191180607448701_1003 to datanode(s) 10.18.52.5:50010 > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira