Nilotpal Nandi created HDDS-543:
-----------------------------------

             Summary: datanode service start throwing NullPointerException
                 Key: HDDS-543
                 URL: https://issues.apache.org/jira/browse/HDDS-543
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
            Reporter: Nilotpal Nandi


steps taken :

-------------------
 # Stopped one of the nodes in 6 node cluster  while write operation was 
running.
 # Tried to start datanode after sometime. Datanode failed to start

 

ozone version :

---------------------

 
{noformat}
Source code repository g...@github.com:apache/hadoop.git -r 
968082ffa5d9e50ed8538f653c375edd1b8feea5
Compiled by elek on 2018-09-19T20:57Z
Compiled with protoc 2.5.0
>From source with checksum efbdeabb5670d69d9efde85846e4ee98
Using HDDS 0.2.1-alpha
Source code repository g...@github.com:apache/hadoop.git -r 
968082ffa5d9e50ed8538f653c375edd1b8feea5
Compiled by elek on 2018-09-19T20:56Z
Compiled with protoc 2.5.0
>From source with checksum 8bf78cff4b73c95d486da5b21053ef
 
{noformat}
 

console log :

--------------------

 
{noformat}
2018-09-24 09:53:54,913 INFO impl.RaftServerImpl: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: 
[0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, 
3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
Sep 24, 2018 9:53:55 AM 
com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
2018-09-24 09:53:55,968 [main] INFO - Listening HDDS REST traffic on 
/0.0.0.0:9880
2018-09-24 09:53:55,969 [main] INFO - Started plug-in 
org.apache.hadoop.ozone.web.OzoneHddsDatanodeService@2fa3be26
2018-09-24 09:53:55,975 [Datanode State Machine Thread - 0] INFO - Attempting 
to start container services.
2018-09-24 09:53:56,171 [Datanode State Machine Thread - 0] INFO - Starting 
XceiverServerRatis 9348f4cb-0bd3-41e7-b1c6-c22463a20c09 at port 9858
2018-09-24 09:53:56,193 [Datanode State Machine Thread - 0] ERROR - Unable to 
start the DatanodeState Machine
java.io.IOException: Premature EOF from inputStream
 at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100)
 at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250)
 at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155)
 at 
org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128)
 at 
org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110)
 at org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132)
 at 
org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110)
 at 
org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155)
 at 
org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123)
 at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162)
 at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:110)
 at org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:106)
 at 
org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
 at 
java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582)
 at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
 at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
 at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
 at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
2018-09-24 09:53:56,195 [main] ERROR - Exception in HddsDatanodeService.
java.lang.NullPointerException
 at 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332)
 at 
org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191)
 at 
org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250)
2018-09-24 09:53:56,198 INFO util.ExitUtil: Exiting with status 1: 
java.lang.NullPointerException
2018-09-24 09:53:56,201 WARN fs.CachingGetSpaceUsed: Thread Interrupted waiting 
to refresh disk information: sleep interrupted
2018-09-24 09:53:56,201 [shutdown-hook-0] INFO - SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down HddsDatanodeService at 
ctr-e138-1518143905142-468367-01-000003.hwx.site/172.27.23.161
 
{noformat}
 

datanode.log

----------------------
{noformat}
2018-09-24 09:42:41,305 ERROR org.apache.ratis.server.impl.RaftServerImpl: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: applyTransaction failed for index:2 
proto:(t:2, i:2)SMLOGENTRY, client-DBCE827E62E6, cid=1
2018-09-24 09:42:41,307 ERROR org.apache.ratis.server.impl.StateMachineUpdater: 
Terminating with exit status 2: 
StateMachineUpdater-9348f4cb-0bd3-41e7-b1c6-c22463a20c09: the 
StateMachineUpdater hits Throwable
java.lang.NullPointerException
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.handleChunkCommit(ContainerStateMachine.java:561)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.executeContainerCommand(ContainerStateMachine.java:615)
 at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:394)
 at 
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1150)
 at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:148)
 at java.lang.Thread.run(Thread.java:745)
2018-09-24 09:42:41,312 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread 
Interrupted waiting to refresh disk information: sleep interrupted
2018-09-24 09:53:38,901 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
Loaded properties from hadoop-metrics2.properties
2018-09-24 09:53:39,013 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Scheduled Metric snapshot period at 10 second(s).
2018-09-24 09:53:39,013 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
HddsDatanode metrics system started
2018-09-24 09:53:39,056 INFO org.apache.hadoop.conf.Configuration.deprecation: 
No unit for ozone.scm.heartbeat.rpc-timeout(1000) assuming MILLISECONDS
2018-09-24 09:53:39,069 INFO org.apache.hadoop.conf.Configuration.deprecation: 
No unit for hdds.write.lock.reporting.threshold.ms(5000) assuming MILLISECONDS
2018-09-24 09:53:39,069 INFO org.apache.hadoop.conf.Configuration.deprecation: 
No unit for hdds.lock.suppress.warning.interval.ms(10000) assuming MILLISECONDS
2018-09-24 09:53:39,751 WARN org.apache.hadoop.hdds.scm.HddsServerUtil: Storage 
directory for Ratis is not configured.Mapping Ratis storage under 
/data/disk1/ozone/meta/ratis. It is a good idea to map this to an SSD disk.
2018-09-24 09:53:39,783 INFO org.apache.ratis.server.impl.RaftServerProxy: 
raft.rpc.type = GRPC (default)
2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.GrpcConfigKeys$Server: 
raft.grpc.server.port = 9858 (custom)
2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.RaftGRpcService: 
raft.grpc.message.size.max = 167772160 (custom)
2018-09-24 09:53:39,795 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.appender.buffer.capacity = 134217728 (custom)
2018-09-24 09:53:39,795 INFO org.apache.ratis.grpc.RaftGRpcService: 
raft.grpc.flow.control.window = 1MB (=1048576) (default)
2018-09-24 09:53:39,796 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpc.request.timeout = 3000 ms (custom)
2018-09-24 09:53:39,835 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
2018-09-24 09:53:39,836 INFO org.apache.ratis.server.impl.RaftServerProxy: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: found a subdirectory 
/data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
2018-09-24 09:53:39,838 INFO org.apache.ratis.server.impl.RaftServerProxy: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: addNew group-85804B25606D:[] returns 
group-85804B25606D:java.util.concurrent.CompletableFuture@74e28667[Not 
completed]
2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpc.timeout.min = 1 s (custom)
2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpc.timeout.max = 1200 ms (custom)
2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpcslowness.timeout = 120 s (custom)
2018-09-24 09:53:39,868 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: ConfigurationManager, init=-1: [], 
old=null, confs=<EMPTY_MAP>
2018-09-24 09:53:39,868 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom)
2018-09-24 09:53:39,873 INFO 
org.apache.ratis.server.storage.RaftStorageDirectory: Lock on 
/data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d/in_use.lock 
acquired by nodename 3792...@ctr-e138-1518143905142-468367-01-000003.hwx.site
2018-09-24 09:53:39,878 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.leader.election.timeout = 120 s (custom)
2018-09-24 09:53:39,881 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.use.memory = false (default)
2018-09-24 09:53:39,882 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.appender.buffer.capacity = 134217728 (custom)
2018-09-24 09:53:39,883 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.segment.size.max = 1073741824 (custom)
2018-09-24 09:53:39,886 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.segment.cache.num.max = 2 (custom)
2018-09-24 09:53:39,889 INFO org.apache.ratis.server.storage.RaftLogWorker: new 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09-RaftLogWorker for Storage Directory 
/data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d
2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.segment.size.max = 1073741824 (custom)
2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.preallocated.size = 134217728 (custom)
2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.write.buffer.size = 33554432 (custom)
2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.force.sync.num = 128 (default)
2018-09-24 09:53:39,925 INFO org.apache.ratis.server.impl.RaftServerImpl: 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: 
[0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, 
3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, 
9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0
2018-09-24 09:53:41,058 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1: java.lang.NullPointerException
2018-09-24 09:53:41,065 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread 
Interrupted waiting to refresh disk information: sleep interrupted{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to