[jira] [Assigned] (HDDS-11943) Fail storage volume after numerous reported IO errors

Ashish Kumar (Jira) Tue, 11 Mar 2025 02:57:50 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ashish Kumar reassigned HDDS-11943:
-----------------------------------

    Assignee: Ashish Kumar

> Fail storage volume after numerous reported IO errors
> -----------------------------------------------------
>
>                 Key: HDDS-11943
>                 URL: https://issues.apache.org/jira/browse/HDDS-11943
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ethan Rose
>            Assignee: Ashish Kumar
>            Priority: Major
>
> Currently on-demand volume scanning is triggered for IO errors encountered 
> while the cluster is running, but the volume can only be failed by a 
> configurable number of volume scans failures.
> The volume scanner syncs a file to the disk and reads it back. This itself 
> alone not catch some types of volume failures. For example, if older sectors 
> of a disk that have already been written to are failing for reads, the 
> container scanner will keep raising errors and marking containers unhealthy, 
> but the corresponding volume scans will always write their file to new 
> sectors that don't have errors.
> To fix this, we can keep a counter of how many IO errors have been reported 
> from on-demand scan requests for a volume. If that number crosses a 
> configurable count, we can fail the volume even if volume scans are passing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (HDDS-11943) Fail storage volume after numerous reported IO errors

Reply via email to