Hi Eric, What was the command line output when you failed to start OM?
Regards, Sammi On Wed, 1 Mar 2023 at 19:18, Eric R <bulletb...@outlook.com> wrote: > Hello *, > having an issue with an on premise single node running v1.3.0 . > I hope this is the right channel to ask this question and I can get some > help. > > When I initially start/configure with scm --init and om --init followed by > the daemon startups, everything is fine and works. > When I stop all daemons with --daemon stop in reverse order, all daemon > stop and I reboot the server. > > If I start now SCM - fine. > But when I start OM afterwards, it fails with the log message below. > If I delete my /data partition (where my directory structure was created) > and re-initialize from scratch it works again. > Rebooting the server and I have again the same issue: "ILLEGAL TRANSITION" > > This is the message: > > 2023-03-01 09:56:45,645 [om1-impl-thread1] INFO > org.apache.ratis.server.storage.RaftStorageDirectory: Lock on > /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired > by nodename 10...@ozone.my.lab > 2023-03-01 09:56:45,649 [om1-impl-thread1] INFO > org.apache.ratis.server.storage.RaftStorage: Read > RaftStorageMetadata{term=0, votedFor=} from > /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/raft-meta > 2023-03-01 09:56:45,652 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.use.memory = > false (default) > 2023-03-01 09:56:45,654 [om1-impl-thread2] INFO > org.apache.ratis.server.storage.RaftStorageDirectory: Lock on > /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/in_use.lock acquired > by nodename 10...@ozone.my.lab > 2023-03-01 09:56:45,657 [om1-impl-thread2] INFO > org.apache.ratis.server.storage.RaftStorage: Read > RaftStorageMetadata{term=2, votedFor=8f50117d-cc59-4090-b60f-710ed770d002} > from > /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/current/raft-meta > 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.purge.gap = > 1000000 (custom) > 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.appender.buffer.byte-limit = 33554432 (custom) > 2023-03-01 09:56:45,671 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.statemachine.data.read.timeout = 1000ms (default) > 2023-03-01 09:56:45,675 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.purge.preservation.log.num = 0 (default) > 2023-03-01 09:56:45,680 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.size.max = 4194304 (custom) > 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.cache.num.max = 2 (custom) > 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.cache.size.max = 200MB (=209715200) (default) > 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: new > om1@group-C5BA1605619E-SegmentedRaftLogWorker for RaftStorageImpl:Storage > Directory /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e > 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.queue.byte-limit = 64MB (=67108864) (default) > 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.queue.element-limit = 4096 (default) > 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.segment.size.max = 4194304 (custom) > 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.preallocated.size = 4194304 (custom) > 2023-03-01 09:56:45,694 [om1-impl-thread2] INFO > org.apache.ratis.server.RaftServer$Division: om1@group-88BF6C146769: set > configuration 45: > peers:[8f50117d-cc59-4090-b60f-710ed770d002|rpc:192.168.56.105:9856 > |admin:192.168.56.105:9857|client:192.168.56.105:9858|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[], > old=null > 2023-03-01 09:56:45,698 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.force.sync.num = 128 (default) > 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.statemachine.data.sync = true (default) > 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.statemachine.data.sync.timeout = 10s (default) > 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.statemachine.data.sync.timeout.retry = -1 (default) > 2023-03-01 09:56:45,707 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.write.buffer.size = 64KB (=65536) (default) > 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.unsafe-flush.enabled = false (default) > 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.async-flush.enabled = false (default) > 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.statemachine.data.caching.enabled = false (default) > 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om1@group-C5BA1605619E-SegmentedRaftLogWorker: flushIndex: > setUnconditionally 0 -> 824 > 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om1@group-C5BA1605619E-SegmentedRaftLogWorker: safeCacheEvictIndex: > setUnconditionally 0 -> -1 > 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: > start as a follower, conf=-1: > peers:[om1|rpc:ozone.my.lab:9872|priority:0|startupRole:FOLLOWER]|listeners:[], > old=null > 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: > changes role from null to FOLLOWER at term 0 for startAsFollower > 2023-03-01 09:56:45,723 [om1-impl-thread1] INFO > org.apache.ratis.server.impl.RoleInfo: om1: start > om1@group-C5BA1605619E-FollowerState > 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpc.first-election.timeout.min = 5s (fallback to > raft.server.rpc.timeout.min) > 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.rpc.first-election.timeout.max = 5200ms (fallback to > raft.server.rpc.timeout.max) > 2023-03-01 09:56:45,727 [om1-impl-thread1] INFO > org.apache.ratis.util.JmxRegister: Successfully registered JMX Bean with > object name Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1 > 2023-03-01 09:56:45,729 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.snapshot.auto.trigger.enabled = true (custom) > 2023-03-01 09:56:45,732 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.snapshot.auto.trigger.threshold = 400000 (default) > 2023-03-01 09:56:45,733 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.snapshot.retention.file.num = -1 (default) > 2023-03-01 09:56:45,734 [om1-impl-thread1] INFO > org.apache.ratis.server.RaftServerConfigKeys: > raft.server.log.purge.upto.snapshot.index = true (custom) > 2023-03-01 09:56:45,738 [Listener at ozone.my.lab/9862] ERROR > org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with > exception > java.util.concurrent.CompletionException: java.lang.IllegalStateException: > ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-C5BA1605619E, > RUNNING -> STARTING > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1300) > at > java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284) > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) > at > java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) > at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:174) > at > org.apache.ratis.util.ConcurrentUtils.lambda$null$3(ConcurrentUtils.java:165) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In > OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING > at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60) > at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121) > at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164) > at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268) > at org.apache.hadoop.ozone.om > .ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:137) > at > org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:170) > at > org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330) > at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:173) > ... 4 more > 2023-03-01 09:56:45,745 [shutdown-hook-0] INFO > org.apache.hadoop.ozone.om.OzoneManagerStarter: > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down OzoneManager at ozone.my.lab/192.168.56.105 > ************************************************************/ > > > > This is my config: > > > > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <configuration> > <property> > <name>ozone.om.address</name> > <value>ozone.my.lab</value> > <tag>OM, REQUIRED</tag> > <description> > The address of the Ozone OM service. This allows clients to discover > the address of the OM. > </description> > </property> > <property> > <name>ozone.metadata.dirs</name> > <value>/data/ozone</value> > <tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag> > <description> > This setting is the fallback location for SCM, OM, Recon and > DataNodes > to store their metadata. This setting may be used only in test/PoC > clusters to simplify configuration. > > For production clusters or any time you care about performance, it is > recommended that ozone.om.db.dirs, ozone.scm.db.dirs and > dfs.container.ratis.datanode.storage.dir be configured separately. > </description> > </property> > <property> > <name>ozone.scm.client.address</name> > <value>ozone.my.lab</value> > <tag>OZONE, SCM, REQUIRED</tag> > <description> > The address of the Ozone SCM client service. This is a required > setting. > > It is a string in the host:port format. The port number is optional > and defaults to 9860. > </description> > </property> > <property> > <name>ozone.scm.names</name> > <value>ozone.my.lab</value> > <tag>OZONE, REQUIRED</tag> > <description> > The value of this property is a set of DNS | DNS:PORT | IP > Address | IP:PORT. Written as a comma separated string. e.g. scm1, > scm2:8020, 7.7.7.7:7777. > This property allows datanodes to discover where SCM is, so that > datanodes can send heartbeat to SCM. > </description> > </property> > <property> > <name>hdds.scm.safemode.min.datanode</name> > <value>1</value> > <tag>SCM, REQUIRED</tag> > <description> > Number of min available datanodes > </description> > </property> > <property> > <name>ozone.replication</name> > <value>1</value> > <tag>OZONE, REQUIRED</tag> > <description> > Number of min available datanodes > </description> > </property> > </configuration> >