[jira] [Updated] (HDDS-12317) Volume should not be marked as unhealthy when disk full

Ethan Rose (Jira) Wed, 12 Feb 2025 13:15:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ethan Rose updated HDDS-12317:
------------------------------
    Description: 
In one of the clusters, we have seen volumes marked as bad incorrectly. 
Due to this, they would get removed from DN memory and send the container 
report to SCM.

It can result in the following problems:
1. This can create the problem of multiple open containers in the same DN and 
eventually can result in data loss due to container brain split.
2. Since the whole container is marked as bad, it will replicate all data to 
other nodes. These volumes should still serve the existing data blocks but 
writes should be blocked until space is freed up. Balancers would take care of 
freeing the space when they run.

This issue was found by [~sumitagrawal] [~smeng] [~erose]  [~ashishk] 
[~j.huang] and [~sbalachandran]

  was:
In one of the clusters, we have seen volumes marked as bad incorrectly. 
Due to this, they would get removed from DN memory and send the container 
report to SCM.

It can result in the following problems:
1. This can create the problem of multiple open containers in the same DN( this 
is safeguarded in CHF4) and eventually can result in data loss due to container 
brain split.
2. Since the whole container is marked as bad, it will replicate all data to 
other nodes. These volumes should still serve the existing data blocks but 
writes should be blocked until space is freed up. Balancers would take care of 
freeing the space when they run.

This issue was found by [~sumitagrawal] [~smeng] [~erose]  [~ashishk] 
[~j.huang] and [~sbalachandran]


> Volume should not be marked as unhealthy when disk full
> -------------------------------------------------------
>
>                 Key: HDDS-12317
>                 URL: https://issues.apache.org/jira/browse/HDDS-12317
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Uma Maheswara Rao G
>            Assignee: Ashish Kumar
>            Priority: Major
>
> In one of the clusters, we have seen volumes marked as bad incorrectly. 
> Due to this, they would get removed from DN memory and send the container 
> report to SCM.
> It can result in the following problems:
> 1. This can create the problem of multiple open containers in the same DN and 
> eventually can result in data loss due to container brain split.
> 2. Since the whole container is marked as bad, it will replicate all data to 
> other nodes. These volumes should still serve the existing data blocks but 
> writes should be blocked until space is freed up. Balancers would take care 
> of freeing the space when they run.
> This issue was found by [~sumitagrawal] [~smeng] [~erose]  [~ashishk] 
> [~j.huang] and [~sbalachandran]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

[jira] [Updated] (HDDS-12317) Volume should not be marked as unhealthy when disk full

Reply via email to