[ 
https://issues.apache.org/jira/browse/HDDS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952573#comment-17952573
 ] 

Swaminathan Balachandran edited comment on HDDS-12090 at 5/19/25 12:00 PM:
---------------------------------------------------------------------------

Actually such problems do exists in the system where we do pause all background 
services which is just not enough. If a rocksdb is open we could be in an 
inconsistent state where a rocksdb wal flush or any background operation could 
occur in the rocksdb which could create an inconsistent snapshot rocksdb after 
creating a checkpoint. We eventually want to get rid of the bootstrap lock 
instead rely on a central snapshot cache lock which would prevent any rocksdb 
from opening while a bootstrap copy batch is running. 


was (Author: swamirishi):
Actually such problems do exists in the system where we do pause all background 
services which is just not enough. If a rocksdb is open we could be in an 
inconsistent state where a rocksdb wal flush or any background operation could 
occur in the rocksdb which could create an inconsistent snapshot. We eventually 
want to get rid of the bootstrap lock instead rely on a central snapshot cache 
lock which would prevent any rocksdb from opening while a bootstrap copy batch 
is running. 

> Fix Snapshot Bootstrapping race condition to prevent snapshot corruption
> ------------------------------------------------------------------------
>
>                 Key: HDDS-12090
>                 URL: https://issues.apache.org/jira/browse/HDDS-12090
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Major
>
> Currently there is an issue with the existing bootstrapping logic when 
> dealing with Snapshotted OM Rocksdb. While bootstrapping no locks are taken 
> and the bootstrapping runs along with active transactions happening on the 
> snapshot rocksdb which could lead to having a corrupted Rocksdb instance post 
> bootstrap on the follower OM. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to