Besides, please change the OM log level to debug, and try the restart. To change OM log level, you can open the etc/hadoop/ozone-env.sh file,
update the line export OZONE_DAEMON_ROOT_LOGGER=DEBUG,RFA On Thu, 2 Mar 2023 at 13:01, Sammi Chen <sammic...@apache.org> wrote: > Hi Eric, > > What was the command line output when you failed to start OM? > > > Regards, > Sammi > > On Wed, 1 Mar 2023 at 19:18, Eric R <bulletb...@outlook.com> wrote: > >> Hello *, >> having an issue with an on premise single node running v1.3.0 . >> I hope this is the right channel to ask this question and I can get some >> help. >> >> When I initially start/configure with scm --init and om --init followed >> by the daemon startups, everything is fine and works. >> When I stop all daemons with --daemon stop in reverse order, all daemon >> stop and I reboot the server. >> >> If I start now SCM - fine. >> But when I start OM afterwards, it fails with the log message below. >> If I delete my /data partition (where my directory structure was created) >> and re-initialize from scratch it works again. >> Rebooting the server and I have again the same issue: "ILLEGAL TRANSITION" >> >> This is the message: >> >> 2023-03-01 09:56:45,645 [om1-impl-thread1] INFO >> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on >> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired >> by nodename 10...@ozone.my.lab >> 2023-03-01 09:56:45,649 [om1-impl-thread1] INFO >> org.apache.ratis.server.storage.RaftStorage: Read >> RaftStorageMetadata{term=0, votedFor=} from >> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/raft-meta >> 2023-03-01 09:56:45,652 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.use.memory = >> false (default) >> 2023-03-01 09:56:45,654 [om1-impl-thread2] INFO >> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on >> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/in_use.lock acquired >> by nodename 10...@ozone.my.lab >> 2023-03-01 09:56:45,657 [om1-impl-thread2] INFO >> org.apache.ratis.server.storage.RaftStorage: Read >> RaftStorageMetadata{term=2, votedFor=8f50117d-cc59-4090-b60f-710ed770d002} >> from >> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/current/raft-meta >> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.purge.gap = >> 1000000 (custom) >> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.appender.buffer.byte-limit = 33554432 (custom) >> 2023-03-01 09:56:45,671 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.statemachine.data.read.timeout = 1000ms (default) >> 2023-03-01 09:56:45,675 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.purge.preservation.log.num = 0 (default) >> 2023-03-01 09:56:45,680 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.segment.size.max = 4194304 (custom) >> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.segment.cache.num.max = 2 (custom) >> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.segment.cache.size.max = 200MB (=209715200) (default) >> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO >> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: new >> om1@group-C5BA1605619E-SegmentedRaftLogWorker for >> RaftStorageImpl:Storage Directory >> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e >> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.queue.byte-limit = 64MB (=67108864) (default) >> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.queue.element-limit = 4096 (default) >> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.segment.size.max = 4194304 (custom) >> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.preallocated.size = 4194304 (custom) >> 2023-03-01 09:56:45,694 [om1-impl-thread2] INFO >> org.apache.ratis.server.RaftServer$Division: om1@group-88BF6C146769: set >> configuration 45: >> peers:[8f50117d-cc59-4090-b60f-710ed770d002|rpc:192.168.56.105:9856 >> |admin:192.168.56.105:9857|client:192.168.56.105:9858|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[], >> old=null >> 2023-03-01 09:56:45,698 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.force.sync.num = 128 (default) >> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.statemachine.data.sync = true (default) >> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.statemachine.data.sync.timeout = 10s (default) >> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.statemachine.data.sync.timeout.retry = -1 (default) >> 2023-03-01 09:56:45,707 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.write.buffer.size = 64KB (=65536) (default) >> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.unsafe-flush.enabled = false (default) >> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.async-flush.enabled = false (default) >> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.statemachine.data.caching.enabled = false (default) >> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO >> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: >> om1@group-C5BA1605619E-SegmentedRaftLogWorker: flushIndex: >> setUnconditionally 0 -> 824 >> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO >> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: >> om1@group-C5BA1605619E-SegmentedRaftLogWorker: safeCacheEvictIndex: >> setUnconditionally 0 -> -1 >> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: >> start as a follower, conf=-1: >> peers:[om1|rpc:ozone.my.lab:9872|priority:0|startupRole:FOLLOWER]|listeners:[], >> old=null >> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: >> changes role from null to FOLLOWER at term 0 for startAsFollower >> 2023-03-01 09:56:45,723 [om1-impl-thread1] INFO >> org.apache.ratis.server.impl.RoleInfo: om1: start >> om1@group-C5BA1605619E-FollowerState >> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.rpc.first-election.timeout.min = 5s (fallback to >> raft.server.rpc.timeout.min) >> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to >> raft.server.rpc.timeout.max) >> 2023-03-01 09:56:45,727 [om1-impl-thread1] INFO >> org.apache.ratis.util.JmxRegister: Successfully registered JMX Bean with >> object name Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1 >> 2023-03-01 09:56:45,729 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.snapshot.auto.trigger.enabled = true (custom) >> 2023-03-01 09:56:45,732 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.snapshot.auto.trigger.threshold = 400000 (default) >> 2023-03-01 09:56:45,733 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.snapshot.retention.file.num = -1 (default) >> 2023-03-01 09:56:45,734 [om1-impl-thread1] INFO >> org.apache.ratis.server.RaftServerConfigKeys: >> raft.server.log.purge.upto.snapshot.index = true (custom) >> 2023-03-01 09:56:45,738 [Listener at ozone.my.lab/9862] ERROR >> org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with >> exception >> java.util.concurrent.CompletionException: >> java.lang.IllegalStateException: ILLEGAL TRANSITION: In >> OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING >> at >> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) >> at >> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) >> at >> java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1300) >> at >> java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284) >> at >> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) >> at >> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) >> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:174) >> at >> org.apache.ratis.util.ConcurrentUtils.lambda$null$3(ConcurrentUtils.java:165) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:750) >> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In >> OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING >> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) >> at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121) >> at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164) >> at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268) >> at org.apache.hadoop.ozone.om >> .ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:137) >> at >> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:170) >> at >> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330) >> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:173) >> ... 4 more >> 2023-03-01 09:56:45,745 [shutdown-hook-0] INFO >> org.apache.hadoop.ozone.om.OzoneManagerStarter: >> SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down OzoneManager at ozone.my.lab/192.168.56.105 >> ************************************************************/ >> >> >> >> This is my config: >> >> >> >> >> <?xml version="1.0" encoding="UTF-8" standalone="yes"?> >> <configuration> >> <property> >> <name>ozone.om.address</name> >> <value>ozone.my.lab</value> >> <tag>OM, REQUIRED</tag> >> <description> >> The address of the Ozone OM service. This allows clients to discover >> the address of the OM. >> </description> >> </property> >> <property> >> <name>ozone.metadata.dirs</name> >> <value>/data/ozone</value> >> <tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag> >> <description> >> This setting is the fallback location for SCM, OM, Recon and >> DataNodes >> to store their metadata. This setting may be used only in test/PoC >> clusters to simplify configuration. >> >> For production clusters or any time you care about performance, it >> is >> recommended that ozone.om.db.dirs, ozone.scm.db.dirs and >> dfs.container.ratis.datanode.storage.dir be configured separately. >> </description> >> </property> >> <property> >> <name>ozone.scm.client.address</name> >> <value>ozone.my.lab</value> >> <tag>OZONE, SCM, REQUIRED</tag> >> <description> >> The address of the Ozone SCM client service. This is a required >> setting. >> >> It is a string in the host:port format. The port number is optional >> and defaults to 9860. >> </description> >> </property> >> <property> >> <name>ozone.scm.names</name> >> <value>ozone.my.lab</value> >> <tag>OZONE, REQUIRED</tag> >> <description> >> The value of this property is a set of DNS | DNS:PORT | IP >> Address | IP:PORT. Written as a comma separated string. e.g. scm1, >> scm2:8020, 7.7.7.7:7777. >> This property allows datanodes to discover where SCM is, so that >> datanodes can send heartbeat to SCM. >> </description> >> </property> >> <property> >> <name>hdds.scm.safemode.min.datanode</name> >> <value>1</value> >> <tag>SCM, REQUIRED</tag> >> <description> >> Number of min available datanodes >> </description> >> </property> >> <property> >> <name>ozone.replication</name> >> <value>1</value> >> <tag>OZONE, REQUIRED</tag> >> <description> >> Number of min available datanodes >> </description> >> </property> >> </configuration> >> >