[ https://issues.apache.org/jira/browse/HDDS-543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arpit Agarwal resolved HDDS-543. -------------------------------- Resolution: Done This should be fixed as part of the restart fixes per [~msingh]. > datanode service start throwing NullPointerException > ---------------------------------------------------- > > Key: HDDS-543 > URL: https://issues.apache.org/jira/browse/HDDS-543 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Nilotpal Nandi > Assignee: Mukul Kumar Singh > Priority: Critical > Attachments: all-node-ozone-logs-1537784799.tar.gz > > > steps taken : > ------------------- > # Stopped one of the nodes in 6 node cluster while write operation was > running. > # Tried to start datanode after sometime. Datanode failed to start > > ozone version : > --------------------- > > {noformat} > Source code repository g...@github.com:apache/hadoop.git -r > 968082ffa5d9e50ed8538f653c375edd1b8feea5 > Compiled by elek on 2018-09-19T20:57Z > Compiled with protoc 2.5.0 > From source with checksum efbdeabb5670d69d9efde85846e4ee98 > Using HDDS 0.2.1-alpha > Source code repository g...@github.com:apache/hadoop.git -r > 968082ffa5d9e50ed8538f653c375edd1b8feea5 > Compiled by elek on 2018-09-19T20:56Z > Compiled with protoc 2.5.0 > From source with checksum 8bf78cff4b73c95d486da5b21053ef > > {noformat} > > console log : > -------------------- > > {noformat} > 2018-09-24 09:53:54,913 INFO impl.RaftServerImpl: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: > [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, > 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0 > Sep 24, 2018 9:53:55 AM > com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 > AM' > 2018-09-24 09:53:55,968 [main] INFO - Listening HDDS REST traffic on > /0.0.0.0:9880 > 2018-09-24 09:53:55,969 [main] INFO - Started plug-in > org.apache.hadoop.ozone.web.OzoneHddsDatanodeService@2fa3be26 > 2018-09-24 09:53:55,975 [Datanode State Machine Thread - 0] INFO - Attempting > to start container services. > 2018-09-24 09:53:56,171 [Datanode State Machine Thread - 0] INFO - Starting > XceiverServerRatis 9348f4cb-0bd3-41e7-b1c6-c22463a20c09 at port 9858 > 2018-09-24 09:53:56,193 [Datanode State Machine Thread - 0] ERROR - Unable to > start the DatanodeState Machine > java.io.IOException: Premature EOF from inputStream > at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100) > at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:250) > at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:155) > at > org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStream.java:128) > at > org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment.java:110) > at > org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.java:132) > at > org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache.java:110) > at > org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(SegmentedRaftLog.java:155) > at > org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLog.java:123) > at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162) > at org.apache.ratis.server.impl.ServerState.<init>(ServerState.java:110) > at > org.apache.ratis.server.impl.RaftServerImpl.<init>(RaftServerImpl.java:106) > at > org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl$0(RaftServerProxy.java:191) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) > 2018-09-24 09:53:56,195 [main] ERROR - Exception in HddsDatanodeService. > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.join(DatanodeStateMachine.java:332) > at > org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.java:191) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:250) > 2018-09-24 09:53:56,198 INFO util.ExitUtil: Exiting with status 1: > java.lang.NullPointerException > 2018-09-24 09:53:56,201 WARN fs.CachingGetSpaceUsed: Thread Interrupted > waiting to refresh disk information: sleep interrupted > 2018-09-24 09:53:56,201 [shutdown-hook-0] INFO - SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down HddsDatanodeService at > ctr-e138-1518143905142-468367-01-000003.hwx.site/172.27.23.161 > > {noformat} > > datanode.log > ---------------------- > {noformat} > 2018-09-24 09:42:41,305 ERROR org.apache.ratis.server.impl.RaftServerImpl: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: applyTransaction failed for index:2 > proto:(t:2, i:2)SMLOGENTRY, client-DBCE827E62E6, cid=1 > 2018-09-24 09:42:41,307 ERROR > org.apache.ratis.server.impl.StateMachineUpdater: Terminating with exit > status 2: StateMachineUpdater-9348f4cb-0bd3-41e7-b1c6-c22463a20c09: the > StateMachineUpdater hits Throwable > java.lang.NullPointerException > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.handleChunkCommit(ContainerStateMachine.java:561) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine$StateMachineHelper.executeContainerCommand(ContainerStateMachine.java:615) > at > org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:394) > at > org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1150) > at > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:148) > at java.lang.Thread.run(Thread.java:745) > 2018-09-24 09:42:41,312 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread > Interrupted waiting to refresh disk information: sleep interrupted > 2018-09-24 09:53:38,901 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: > Loaded properties from hadoop-metrics2.properties > 2018-09-24 09:53:39,013 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 2018-09-24 09:53:39,013 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: HddsDatanode metrics > system started > 2018-09-24 09:53:39,056 INFO > org.apache.hadoop.conf.Configuration.deprecation: No unit for > ozone.scm.heartbeat.rpc-timeout(1000) assuming MILLISECONDS > 2018-09-24 09:53:39,069 INFO > org.apache.hadoop.conf.Configuration.deprecation: No unit for > hdds.write.lock.reporting.threshold.ms(5000) assuming MILLISECONDS > 2018-09-24 09:53:39,069 INFO > org.apache.hadoop.conf.Configuration.deprecation: No unit for > hdds.lock.suppress.warning.interval.ms(10000) assuming MILLISECONDS > 2018-09-24 09:53:39,751 WARN org.apache.hadoop.hdds.scm.HddsServerUtil: > Storage directory for Ratis is not configured.Mapping Ratis storage under > /data/disk1/ozone/meta/ratis. It is a good idea to map this to an SSD disk. > 2018-09-24 09:53:39,783 INFO org.apache.ratis.server.impl.RaftServerProxy: > raft.rpc.type = GRPC (default) > 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.GrpcConfigKeys$Server: > raft.grpc.server.port = 9858 (custom) > 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.RaftGRpcService: > raft.grpc.message.size.max = 167772160 (custom) > 2018-09-24 09:53:39,795 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.appender.buffer.capacity = 134217728 (custom) > 2018-09-24 09:53:39,795 INFO org.apache.ratis.grpc.RaftGRpcService: > raft.grpc.flow.control.window = 1MB (=1048576) (default) > 2018-09-24 09:53:39,796 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpc.request.timeout = 3000 ms (custom) > 2018-09-24 09:53:39,835 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom) > 2018-09-24 09:53:39,836 INFO org.apache.ratis.server.impl.RaftServerProxy: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: found a subdirectory > /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d > 2018-09-24 09:53:39,838 INFO org.apache.ratis.server.impl.RaftServerProxy: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: addNew group-85804B25606D:[] returns > group-85804B25606D:java.util.concurrent.CompletableFuture@74e28667[Not > completed] > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpc.timeout.min = 1 s (custom) > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpc.timeout.max = 1200 ms (custom) > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpcslowness.timeout = 120 s (custom) > 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: ConfigurationManager, init=-1: [], > old=null, confs=<EMPTY_MAP> > 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.storage.dir = /data/disk1/ozone/meta/ratis (custom) > 2018-09-24 09:53:39,873 INFO > org.apache.ratis.server.storage.RaftStorageDirectory: Lock on > /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d/in_use.lock > acquired by nodename 3792...@ctr-e138-1518143905142-468367-01-000003.hwx.site > 2018-09-24 09:53:39,878 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.leader.election.timeout = 120 s (custom) > 2018-09-24 09:53:39,881 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.use.memory = false (default) > 2018-09-24 09:53:39,882 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.appender.buffer.capacity = 134217728 (custom) > 2018-09-24 09:53:39,883 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.size.max = 1073741824 (custom) > 2018-09-24 09:53:39,886 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.cache.num.max = 2 (custom) > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.storage.RaftLogWorker: > new 9348f4cb-0bd3-41e7-b1c6-c22463a20c09-RaftLogWorker for Storage Directory > /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.size.max = 1073741824 (custom) > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.preallocated.size = 134217728 (custom) > 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.write.buffer.size = 33554432 (custom) > 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.force.sync.num = 128 (default) > 2018-09-24 09:53:39,925 INFO org.apache.ratis.server.impl.RaftServerImpl: > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: > [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, > 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, > 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=null at 0 > 2018-09-24 09:53:41,058 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: java.lang.NullPointerException > 2018-09-24 09:53:41,065 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Thread > Interrupted waiting to refresh disk information: sleep interrupted{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org