[ https://issues.apache.org/jira/browse/HDDS-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Rose updated HDDS-12317: ------------------------------ Description: In one of the clusters, we have seen volumes marked as bad incorrectly. Due to this, they would get removed from DN memory and send the container report to SCM. It can result in the following problems: 1. This can create the problem of multiple open containers in the same DN and eventually can result in data loss due to container brain split. 2. Since the whole container is marked as bad, it will replicate all data to other nodes. These volumes should still serve the existing data blocks but writes should be blocked until space is freed up. Balancers would take care of freeing the space when they run. This issue was found by [~sumitagrawal] [~smeng] [~erose] [~ashishk] [~j.huang] and [~sbalachandran] was: In one of the clusters, we have seen volumes marked as bad incorrectly. Due to this, they would get removed from DN memory and send the container report to SCM. It can result in the following problems: 1. This can create the problem of multiple open containers in the same DN( this is safeguarded in CHF4) and eventually can result in data loss due to container brain split. 2. Since the whole container is marked as bad, it will replicate all data to other nodes. These volumes should still serve the existing data blocks but writes should be blocked until space is freed up. Balancers would take care of freeing the space when they run. This issue was found by [~sumitagrawal] [~smeng] [~erose] [~ashishk] [~j.huang] and [~sbalachandran] > Volume should not be marked as unhealthy when disk full > ------------------------------------------------------- > > Key: HDDS-12317 > URL: https://issues.apache.org/jira/browse/HDDS-12317 > Project: Apache Ozone > Issue Type: Bug > Components: Ozone Datanode > Reporter: Uma Maheswara Rao G > Assignee: Ashish Kumar > Priority: Major > > In one of the clusters, we have seen volumes marked as bad incorrectly. > Due to this, they would get removed from DN memory and send the container > report to SCM. > It can result in the following problems: > 1. This can create the problem of multiple open containers in the same DN and > eventually can result in data loss due to container brain split. > 2. Since the whole container is marked as bad, it will replicate all data to > other nodes. These volumes should still serve the existing data blocks but > writes should be blocked until space is freed up. Balancers would take care > of freeing the space when they run. > This issue was found by [~sumitagrawal] [~smeng] [~erose] [~ashishk] > [~j.huang] and [~sbalachandran] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org