Hi Eric,

What was the command line output when you failed to start OM?


Regards,
Sammi

On Wed, 1 Mar 2023 at 19:18, Eric R <bulletb...@outlook.com> wrote:

> Hello *,
> having an issue with an on premise single node running v1.3.0 .
> I hope this is the right channel to ask this question and I can get some
> help.
>
> When I initially start/configure with scm --init and om --init followed by
> the daemon startups, everything is fine and works.
> When I stop all daemons with --daemon stop in reverse order, all daemon
> stop and I reboot the server.
>
> If I start now SCM - fine.
> But when I start OM afterwards, it fails with the log message below.
> If I delete my /data partition (where my directory structure was created)
> and re-initialize from scratch it works again.
> Rebooting the server and I have again the same issue: "ILLEGAL TRANSITION"
>
> This is the message:
>
> 2023-03-01 09:56:45,645 [om1-impl-thread1] INFO
> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on
> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired
> by nodename 10...@ozone.my.lab
> 2023-03-01 09:56:45,649 [om1-impl-thread1] INFO
> org.apache.ratis.server.storage.RaftStorage: Read
> RaftStorageMetadata{term=0, votedFor=} from
> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/raft-meta
> 2023-03-01 09:56:45,652 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.use.memory =
> false (default)
> 2023-03-01 09:56:45,654 [om1-impl-thread2] INFO
> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on
> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/in_use.lock acquired
> by nodename 10...@ozone.my.lab
> 2023-03-01 09:56:45,657 [om1-impl-thread2] INFO
> org.apache.ratis.server.storage.RaftStorage: Read
> RaftStorageMetadata{term=2, votedFor=8f50117d-cc59-4090-b60f-710ed770d002}
> from
> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/current/raft-meta
> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.purge.gap =
> 1000000 (custom)
> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
> 2023-03-01 09:56:45,671 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.statemachine.data.read.timeout = 1000ms (default)
> 2023-03-01 09:56:45,675 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.purge.preservation.log.num = 0 (default)
> 2023-03-01 09:56:45,680 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.size.max = 4194304 (custom)
> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.cache.num.max = 2 (custom)
> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.cache.size.max = 200MB (=209715200) (default)
> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: new
> om1@group-C5BA1605619E-SegmentedRaftLogWorker for RaftStorageImpl:Storage
> Directory /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e
> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.queue.byte-limit = 64MB (=67108864) (default)
> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.queue.element-limit = 4096 (default)
> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.segment.size.max = 4194304 (custom)
> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.preallocated.size = 4194304 (custom)
> 2023-03-01 09:56:45,694 [om1-impl-thread2] INFO
> org.apache.ratis.server.RaftServer$Division: om1@group-88BF6C146769: set
> configuration 45:
> peers:[8f50117d-cc59-4090-b60f-710ed770d002|rpc:192.168.56.105:9856
> |admin:192.168.56.105:9857|client:192.168.56.105:9858|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[],
> old=null
> 2023-03-01 09:56:45,698 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.force.sync.num = 128 (default)
> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.statemachine.data.sync = true (default)
> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.statemachine.data.sync.timeout = 10s (default)
> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.statemachine.data.sync.timeout.retry = -1 (default)
> 2023-03-01 09:56:45,707 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.write.buffer.size = 64KB (=65536) (default)
> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.unsafe-flush.enabled = false (default)
> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.async-flush.enabled = false (default)
> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.statemachine.data.caching.enabled = false (default)
> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om1@group-C5BA1605619E-SegmentedRaftLogWorker: flushIndex:
> setUnconditionally 0 -> 824
> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
> om1@group-C5BA1605619E-SegmentedRaftLogWorker: safeCacheEvictIndex:
> setUnconditionally 0 -> -1
> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E:
> start as a follower, conf=-1:
> peers:[om1|rpc:ozone.my.lab:9872|priority:0|startupRole:FOLLOWER]|listeners:[],
> old=null
> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E:
> changes role from      null to FOLLOWER at term 0 for startAsFollower
> 2023-03-01 09:56:45,723 [om1-impl-thread1] INFO
> org.apache.ratis.server.impl.RoleInfo: om1: start
> om1@group-C5BA1605619E-FollowerState
> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.min = 5s (fallback to
> raft.server.rpc.timeout.min)
> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
> raft.server.rpc.timeout.max)
> 2023-03-01 09:56:45,727 [om1-impl-thread1] INFO
> org.apache.ratis.util.JmxRegister: Successfully registered JMX Bean with
> object name Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1
> 2023-03-01 09:56:45,729 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.snapshot.auto.trigger.enabled = true (custom)
> 2023-03-01 09:56:45,732 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.snapshot.auto.trigger.threshold = 400000 (default)
> 2023-03-01 09:56:45,733 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.snapshot.retention.file.num = -1 (default)
> 2023-03-01 09:56:45,734 [om1-impl-thread1] INFO
> org.apache.ratis.server.RaftServerConfigKeys:
> raft.server.log.purge.upto.snapshot.index = true (custom)
> 2023-03-01 09:56:45,738 [Listener at ozone.my.lab/9862] ERROR
> org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
> exception
> java.util.concurrent.CompletionException: java.lang.IllegalStateException:
> ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-C5BA1605619E,
> RUNNING -> STARTING
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1300)
> at
> java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:174)
> at
> org.apache.ratis.util.ConcurrentUtils.lambda$null$3(ConcurrentUtils.java:165)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In
> OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
> at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
> at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
> at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
> at org.apache.hadoop.ozone.om
> .ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:137)
> at
> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:170)
> at
> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330)
> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:173)
> ... 4 more
> 2023-03-01 09:56:45,745 [shutdown-hook-0] INFO 
> org.apache.hadoop.ozone.om.OzoneManagerStarter:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down OzoneManager at ozone.my.lab/192.168.56.105
> ************************************************************/
>
>
>
> This is my config:
>
>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <configuration>
>     <property>
>         <name>ozone.om.address</name>
>         <value>ozone.my.lab</value>
>         <tag>OM, REQUIRED</tag>
>         <description>
>       The address of the Ozone OM service. This allows clients to discover
>       the address of the OM.
>     </description>
>     </property>
>     <property>
>         <name>ozone.metadata.dirs</name>
>         <value>/data/ozone</value>
>         <tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag>
>         <description>
>       This setting is the fallback location for SCM, OM, Recon and
> DataNodes
>       to store their metadata. This setting may be used only in test/PoC
>       clusters to simplify configuration.
>
>       For production clusters or any time you care about performance, it is
>       recommended that ozone.om.db.dirs, ozone.scm.db.dirs and
>       dfs.container.ratis.datanode.storage.dir be configured separately.
>     </description>
>     </property>
>     <property>
>         <name>ozone.scm.client.address</name>
>         <value>ozone.my.lab</value>
>         <tag>OZONE, SCM, REQUIRED</tag>
>         <description>
>       The address of the Ozone SCM client service. This is a required
> setting.
>
>       It is a string in the host:port format. The port number is optional
>       and defaults to 9860.
>     </description>
>     </property>
>     <property>
>         <name>ozone.scm.names</name>
>         <value>ozone.my.lab</value>
>         <tag>OZONE, REQUIRED</tag>
>         <description>
>       The value of this property is a set of DNS | DNS:PORT | IP
>       Address | IP:PORT. Written as a comma separated string. e.g. scm1,
>       scm2:8020, 7.7.7.7:7777.
>       This property allows datanodes to discover where SCM is, so that
>       datanodes can send heartbeat to SCM.
>     </description>
>     </property>
>     <property>
>         <name>hdds.scm.safemode.min.datanode</name>
>         <value>1</value>
>         <tag>SCM, REQUIRED</tag>
>         <description>
>      Number of min available datanodes
>     </description>
>     </property>
>     <property>
>         <name>ozone.replication</name>
>         <value>1</value>
>         <tag>OZONE, REQUIRED</tag>
>         <description>
>      Number of min available datanodes
>     </description>
>     </property>
> </configuration>
>

Reply via email to