Hello *,
having an issue with an on premise single node running v1.3.0 .
I hope this is the right channel to ask this question and I can get some help.

When I initially start/configure with scm --init and om --init followed by the 
daemon startups, everything is fine and works.
When I stop all daemons with --daemon stop in reverse order, all daemon stop 
and I reboot the server.

If I start now SCM - fine.
But when I start OM afterwards, it fails with the log message below.
If I delete my /data partition (where my directory structure was created) and 
re-initialize from scratch it works again.
Rebooting the server and I have again the same issue: "ILLEGAL TRANSITION"

This is the message:

2023-03-01 09:56:45,645 [om1-impl-thread1] INFO 
org.apache.ratis.server.storage.RaftStorageDirectory: Lock on 
/data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired by 
nodename 10...@ozone.my.lab
2023-03-01 09:56:45,649 [om1-impl-thread1] INFO 
org.apache.ratis.server.storage.RaftStorage: Read RaftStorageMetadata{term=0, 
votedFor=} from 
/data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/raft-meta
2023-03-01 09:56:45,652 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.use.memory = 
false (default)
2023-03-01 09:56:45,654 [om1-impl-thread2] INFO 
org.apache.ratis.server.storage.RaftStorageDirectory: Lock on 
/data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/in_use.lock acquired by 
nodename 10...@ozone.my.lab
2023-03-01 09:56:45,657 [om1-impl-thread2] INFO 
org.apache.ratis.server.storage.RaftStorage: Read RaftStorageMetadata{term=2, 
votedFor=8f50117d-cc59-4090-b60f-710ed770d002} from 
/data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/current/raft-meta
2023-03-01 09:56:45,667 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.purge.gap = 
1000000 (custom)
2023-03-01 09:56:45,667 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
2023-03-01 09:56:45,671 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.statemachine.data.read.timeout = 1000ms (default)
2023-03-01 09:56:45,675 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.purge.preservation.log.num = 0 (default)
2023-03-01 09:56:45,680 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.segment.size.max 
= 4194304 (custom)
2023-03-01 09:56:45,688 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.segment.cache.num.max = 2 (custom)
2023-03-01 09:56:45,688 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.segment.cache.size.max = 200MB (=209715200) (default)
2023-03-01 09:56:45,693 [om1-impl-thread1] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: new 
om1@group-C5BA1605619E-SegmentedRaftLogWorker for RaftStorageImpl:Storage 
Directory /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e
2023-03-01 09:56:45,693 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.queue.byte-limit 
= 64MB (=67108864) (default)
2023-03-01 09:56:45,693 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.queue.element-limit = 4096 (default)
2023-03-01 09:56:45,694 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.segment.size.max 
= 4194304 (custom)
2023-03-01 09:56:45,694 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.preallocated.size 
= 4194304 (custom)
2023-03-01 09:56:45,694 [om1-impl-thread2] INFO 
org.apache.ratis.server.RaftServer$Division: om1@group-88BF6C146769: set 
configuration 45: 
peers:[8f50117d-cc59-4090-b60f-710ed770d002|rpc:192.168.56.105:9856|admin:192.168.56.105:9857|client:192.168.56.105:9858|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[],
 old=null
2023-03-01 09:56:45,698 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.force.sync.num = 
128 (default)
2023-03-01 09:56:45,699 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.statemachine.data.sync = true (default)
2023-03-01 09:56:45,699 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.statemachine.data.sync.timeout = 10s (default)
2023-03-01 09:56:45,699 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.statemachine.data.sync.timeout.retry = -1 (default)
2023-03-01 09:56:45,707 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.write.buffer.size 
= 64KB (=65536) (default)
2023-03-01 09:56:45,708 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.unsafe-flush.enabled = false (default)
2023-03-01 09:56:45,708 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.async-flush.enabled = false (default)
2023-03-01 09:56:45,708 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.statemachine.data.caching.enabled = false (default)
2023-03-01 09:56:45,717 [om1-impl-thread1] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
om1@group-C5BA1605619E-SegmentedRaftLogWorker: flushIndex: setUnconditionally 0 
-> 824
2023-03-01 09:56:45,717 [om1-impl-thread1] INFO 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: 
om1@group-C5BA1605619E-SegmentedRaftLogWorker: safeCacheEvictIndex: 
setUnconditionally 0 -> -1
2023-03-01 09:56:45,719 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: start as a 
follower, conf=-1: 
peers:[om1|rpc:ozone.my.lab:9872|priority:0|startupRole:FOLLOWER]|listeners:[], 
old=null
2023-03-01 09:56:45,719 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E: changes 
role from      null to FOLLOWER at term 0 for startAsFollower
2023-03-01 09:56:45,723 [om1-impl-thread1] INFO 
org.apache.ratis.server.impl.RoleInfo: om1: start 
om1@group-C5BA1605619E-FollowerState
2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpc.first-election.timeout.min = 5s (fallback to 
raft.server.rpc.timeout.min)
2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.rpc.first-election.timeout.max = 5200ms (fallback to 
raft.server.rpc.timeout.max)
2023-03-01 09:56:45,727 [om1-impl-thread1] INFO 
org.apache.ratis.util.JmxRegister: Successfully registered JMX Bean with object 
name Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1
2023-03-01 09:56:45,729 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.snapshot.auto.trigger.enabled = true (custom)
2023-03-01 09:56:45,732 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.snapshot.auto.trigger.threshold = 400000 (default)
2023-03-01 09:56:45,733 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.snapshot.retention.file.num = -1 (default)
2023-03-01 09:56:45,734 [om1-impl-thread1] INFO 
org.apache.ratis.server.RaftServerConfigKeys: 
raft.server.log.purge.upto.snapshot.index = true (custom)
2023-03-01 09:56:45,738 [Listener at ozone.my.lab/9862] ERROR 
org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with exception
java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
ILLEGAL TRANSITION: In OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING 
-> STARTING
      at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
      at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
      at 
java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1300)
      at 
java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
      at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
      at 
java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
      at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:174)
      at 
org.apache.ratis.util.ConcurrentUtils.lambda$null$3(ConcurrentUtils.java:165)
      at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In 
OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING
      at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
      at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
      at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
      at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
      at 
org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:137)
      at 
org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:170)
      at 
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330)
      at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:173)
      ... 4 more
2023-03-01 09:56:45,745 [shutdown-hook-0] INFO 
org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down OzoneManager at ozone.my.lab/192.168.56.105
************************************************************/



This is my config:




<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<configuration>
    <property>
        <name>ozone.om.address</name>
        <value>ozone.my.lab</value>
        <tag>OM, REQUIRED</tag>
        <description>
      The address of the Ozone OM service. This allows clients to discover
      the address of the OM.
    </description>
    </property>
    <property>
        <name>ozone.metadata.dirs</name>
        <value>/data/ozone</value>
        <tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag>
        <description>
      This setting is the fallback location for SCM, OM, Recon and DataNodes
      to store their metadata. This setting may be used only in test/PoC
      clusters to simplify configuration.

      For production clusters or any time you care about performance, it is
      recommended that ozone.om.db.dirs, ozone.scm.db.dirs and
      dfs.container.ratis.datanode.storage.dir be configured separately.
    </description>
    </property>
    <property>
        <name>ozone.scm.client.address</name>
        <value>ozone.my.lab</value>
        <tag>OZONE, SCM, REQUIRED</tag>
        <description>
      The address of the Ozone SCM client service. This is a required setting.

      It is a string in the host:port format. The port number is optional
      and defaults to 9860.
    </description>
    </property>
    <property>
        <name>ozone.scm.names</name>
        <value>ozone.my.lab</value>
        <tag>OZONE, REQUIRED</tag>
        <description>
      The value of this property is a set of DNS | DNS:PORT | IP
      Address | IP:PORT. Written as a comma separated string. e.g. scm1,
      scm2:8020, 7.7.7.7:7777.
      This property allows datanodes to discover where SCM is, so that
      datanodes can send heartbeat to SCM.
    </description>
    </property>
    <property>
        <name>hdds.scm.safemode.min.datanode</name>
        <value>1</value>
        <tag>SCM, REQUIRED</tag>
        <description>
     Number of min available datanodes
    </description>
    </property>
    <property>
        <name>ozone.replication</name>
        <value>1</value>
        <tag>OZONE, REQUIRED</tag>
        <description>
     Number of min available datanodes
    </description>
    </property>
</configuration>

Reply via email to