Ethan Rose created HDDS-11743:
---------------------------------

             Summary: Provide debug/repair commands for OM DB import/export
                 Key: HDDS-11743
                 URL: https://issues.apache.org/jira/browse/HDDS-11743
             Project: Apache Ozone
          Issue Type: Sub-task
          Components: Ozone Manager
            Reporter: Ethan Rose


Sometimes if an OM follower gets into a bad state, it can be faster to copy the 
leader DB over and restart them than depend on Ratis to catch them up. With the 
introduction of filesystem snapshots, however, manually copying the OM DB has 
become much more complicated. The goal of this Jira is to provide CLI commands 
that can automate all parts of the DB import/export process except the network 
copy. Flow would look something like this:
 # {{ozone debug om export-db --db=<om-db-location>}} would create a tarball of 
the current OM DB and its snapshots.
 ** Tar should preserve the hardlinks by default without taking extra space.
 ** An optional {{--output}} option can be used if the current disk does not 
have enough space.
 # DB tarball is manually copied over the network to the follower OM node. This 
removes all auth work from the CLI.
 # {{ozone repair om import-db --db=<source-db-tarball> 
--destination=<om-db-dir>}} would take the tarball and unpack it to the 
configured directory.
 ** The CLI could fail if the existing OM DB is already there. In this case it 
should instruct the user to manually move it to a backup location.
 ** The {{ozone repair}} command already has a warning about running as a user 
with correct permissions, which will prevent errors with unreadable SST files 
we have seen in the past.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to