Besides, please change the OM log level to debug, and try the restart.

To change OM log level,  you can open the etc/hadoop/ozone-env.sh file,

update the line

export OZONE_DAEMON_ROOT_LOGGER=DEBUG,RFA



On Thu, 2 Mar 2023 at 13:01, Sammi Chen <sammic...@apache.org> wrote:

> Hi Eric,
>
> What was the command line output when you failed to start OM?
>
>
> Regards,
> Sammi
>
> On Wed, 1 Mar 2023 at 19:18, Eric R <bulletb...@outlook.com> wrote:
>
>> Hello *,
>> having an issue with an on premise single node running v1.3.0 .
>> I hope this is the right channel to ask this question and I can get some
>> help.
>>
>> When I initially start/configure with scm --init and om --init followed
>> by the daemon startups, everything is fine and works.
>> When I stop all daemons with --daemon stop in reverse order, all daemon
>> stop and I reboot the server.
>>
>> If I start now SCM - fine.
>> But when I start OM afterwards, it fails with the log message below.
>> If I delete my /data partition (where my directory structure was created)
>> and re-initialize from scratch it works again.
>> Rebooting the server and I have again the same issue: "ILLEGAL TRANSITION"
>>
>> This is the message:
>>
>> 2023-03-01 09:56:45,645 [om1-impl-thread1] INFO
>> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on
>> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/in_use.lock acquired
>> by nodename 10...@ozone.my.lab
>> 2023-03-01 09:56:45,649 [om1-impl-thread1] INFO
>> org.apache.ratis.server.storage.RaftStorage: Read
>> RaftStorageMetadata{term=0, votedFor=} from
>> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e/current/raft-meta
>> 2023-03-01 09:56:45,652 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.use.memory =
>> false (default)
>> 2023-03-01 09:56:45,654 [om1-impl-thread2] INFO
>> org.apache.ratis.server.storage.RaftStorageDirectory: Lock on
>> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/in_use.lock acquired
>> by nodename 10...@ozone.my.lab
>> 2023-03-01 09:56:45,657 [om1-impl-thread2] INFO
>> org.apache.ratis.server.storage.RaftStorage: Read
>> RaftStorageMetadata{term=2, votedFor=8f50117d-cc59-4090-b60f-710ed770d002}
>> from
>> /data/ozone/ratis/8570f4cf-72ff-489f-9c74-88bf6c146769/current/raft-meta
>> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys: raft.server.log.purge.gap =
>> 1000000 (custom)
>> 2023-03-01 09:56:45,667 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.appender.buffer.byte-limit = 33554432 (custom)
>> 2023-03-01 09:56:45,671 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.statemachine.data.read.timeout = 1000ms (default)
>> 2023-03-01 09:56:45,675 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.purge.preservation.log.num = 0 (default)
>> 2023-03-01 09:56:45,680 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.segment.size.max = 4194304 (custom)
>> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.segment.cache.num.max = 2 (custom)
>> 2023-03-01 09:56:45,688 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.segment.cache.size.max = 200MB (=209715200) (default)
>> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
>> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: new
>> om1@group-C5BA1605619E-SegmentedRaftLogWorker for
>> RaftStorageImpl:Storage Directory
>> /data/ozone/ratis/bf265839-605b-3f16-9796-c5ba1605619e
>> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.queue.byte-limit = 64MB (=67108864) (default)
>> 2023-03-01 09:56:45,693 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.queue.element-limit = 4096 (default)
>> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.segment.size.max = 4194304 (custom)
>> 2023-03-01 09:56:45,694 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.preallocated.size = 4194304 (custom)
>> 2023-03-01 09:56:45,694 [om1-impl-thread2] INFO
>> org.apache.ratis.server.RaftServer$Division: om1@group-88BF6C146769: set
>> configuration 45:
>> peers:[8f50117d-cc59-4090-b60f-710ed770d002|rpc:192.168.56.105:9856
>> |admin:192.168.56.105:9857|client:192.168.56.105:9858|dataStream:|priority:1|startupRole:FOLLOWER]|listeners:[],
>> old=null
>> 2023-03-01 09:56:45,698 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.force.sync.num = 128 (default)
>> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.statemachine.data.sync = true (default)
>> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.statemachine.data.sync.timeout = 10s (default)
>> 2023-03-01 09:56:45,699 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.statemachine.data.sync.timeout.retry = -1 (default)
>> 2023-03-01 09:56:45,707 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.write.buffer.size = 64KB (=65536) (default)
>> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.unsafe-flush.enabled = false (default)
>> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.async-flush.enabled = false (default)
>> 2023-03-01 09:56:45,708 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.statemachine.data.caching.enabled = false (default)
>> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO
>> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>> om1@group-C5BA1605619E-SegmentedRaftLogWorker: flushIndex:
>> setUnconditionally 0 -> 824
>> 2023-03-01 09:56:45,717 [om1-impl-thread1] INFO
>> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker:
>> om1@group-C5BA1605619E-SegmentedRaftLogWorker: safeCacheEvictIndex:
>> setUnconditionally 0 -> -1
>> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E:
>> start as a follower, conf=-1:
>> peers:[om1|rpc:ozone.my.lab:9872|priority:0|startupRole:FOLLOWER]|listeners:[],
>> old=null
>> 2023-03-01 09:56:45,719 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServer$Division: om1@group-C5BA1605619E:
>> changes role from      null to FOLLOWER at term 0 for startAsFollower
>> 2023-03-01 09:56:45,723 [om1-impl-thread1] INFO
>> org.apache.ratis.server.impl.RoleInfo: om1: start
>> om1@group-C5BA1605619E-FollowerState
>> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.rpc.first-election.timeout.min = 5s (fallback to
>> raft.server.rpc.timeout.min)
>> 2023-03-01 09:56:45,724 [om1@group-C5BA1605619E-FollowerState] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.rpc.first-election.timeout.max = 5200ms (fallback to
>> raft.server.rpc.timeout.max)
>> 2023-03-01 09:56:45,727 [om1-impl-thread1] INFO
>> org.apache.ratis.util.JmxRegister: Successfully registered JMX Bean with
>> object name Ratis:service=RaftServer,group=group-C5BA1605619E,id=om1
>> 2023-03-01 09:56:45,729 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.snapshot.auto.trigger.enabled = true (custom)
>> 2023-03-01 09:56:45,732 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.snapshot.auto.trigger.threshold = 400000 (default)
>> 2023-03-01 09:56:45,733 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.snapshot.retention.file.num = -1 (default)
>> 2023-03-01 09:56:45,734 [om1-impl-thread1] INFO
>> org.apache.ratis.server.RaftServerConfigKeys:
>> raft.server.log.purge.upto.snapshot.index = true (custom)
>> 2023-03-01 09:56:45,738 [Listener at ozone.my.lab/9862] ERROR
>> org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with
>> exception
>> java.util.concurrent.CompletionException:
>> java.lang.IllegalStateException: ILLEGAL TRANSITION: In
>> OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING
>> at
>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>> at
>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>> at
>> java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1300)
>> at
>> java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1284)
>> at
>> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
>> at
>> java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
>> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:174)
>> at
>> org.apache.ratis.util.ConcurrentUtils.lambda$null$3(ConcurrentUtils.java:165)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:750)
>> Caused by: java.lang.IllegalStateException: ILLEGAL TRANSITION: In
>> OzoneManagerStateMachine:om1:group-C5BA1605619E, RUNNING -> STARTING
>> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>> at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:121)
>> at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:164)
>> at org.apache.ratis.util.LifeCycle.startAndTransition(LifeCycle.java:268)
>> at org.apache.hadoop.ozone.om
>> .ratis.OzoneManagerStateMachine.initialize(OzoneManagerStateMachine.java:137)
>> at
>> org.apache.ratis.server.impl.ServerState.initialize(ServerState.java:170)
>> at
>> org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:330)
>> at org.apache.ratis.util.ConcurrentUtils.accept(ConcurrentUtils.java:173)
>> ... 4 more
>> 2023-03-01 09:56:45,745 [shutdown-hook-0] INFO 
>> org.apache.hadoop.ozone.om.OzoneManagerStarter:
>> SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down OzoneManager at ozone.my.lab/192.168.56.105
>> ************************************************************/
>>
>>
>>
>> This is my config:
>>
>>
>>
>>
>> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>> <configuration>
>>     <property>
>>         <name>ozone.om.address</name>
>>         <value>ozone.my.lab</value>
>>         <tag>OM, REQUIRED</tag>
>>         <description>
>>       The address of the Ozone OM service. This allows clients to discover
>>       the address of the OM.
>>     </description>
>>     </property>
>>     <property>
>>         <name>ozone.metadata.dirs</name>
>>         <value>/data/ozone</value>
>>         <tag>OZONE, OM, SCM, CONTAINER, STORAGE, REQUIRED</tag>
>>         <description>
>>       This setting is the fallback location for SCM, OM, Recon and
>> DataNodes
>>       to store their metadata. This setting may be used only in test/PoC
>>       clusters to simplify configuration.
>>
>>       For production clusters or any time you care about performance, it
>> is
>>       recommended that ozone.om.db.dirs, ozone.scm.db.dirs and
>>       dfs.container.ratis.datanode.storage.dir be configured separately.
>>     </description>
>>     </property>
>>     <property>
>>         <name>ozone.scm.client.address</name>
>>         <value>ozone.my.lab</value>
>>         <tag>OZONE, SCM, REQUIRED</tag>
>>         <description>
>>       The address of the Ozone SCM client service. This is a required
>> setting.
>>
>>       It is a string in the host:port format. The port number is optional
>>       and defaults to 9860.
>>     </description>
>>     </property>
>>     <property>
>>         <name>ozone.scm.names</name>
>>         <value>ozone.my.lab</value>
>>         <tag>OZONE, REQUIRED</tag>
>>         <description>
>>       The value of this property is a set of DNS | DNS:PORT | IP
>>       Address | IP:PORT. Written as a comma separated string. e.g. scm1,
>>       scm2:8020, 7.7.7.7:7777.
>>       This property allows datanodes to discover where SCM is, so that
>>       datanodes can send heartbeat to SCM.
>>     </description>
>>     </property>
>>     <property>
>>         <name>hdds.scm.safemode.min.datanode</name>
>>         <value>1</value>
>>         <tag>SCM, REQUIRED</tag>
>>         <description>
>>      Number of min available datanodes
>>     </description>
>>     </property>
>>     <property>
>>         <name>ozone.replication</name>
>>         <value>1</value>
>>         <tag>OZONE, REQUIRED</tag>
>>         <description>
>>      Number of min available datanodes
>>     </description>
>>     </property>
>> </configuration>
>>
>

Reply via email to