[
https://issues.apache.org/jira/browse/HDDS-11943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Kumar reassigned HDDS-11943:
-----------------------------------
Assignee: Ashish Kumar
> Fail storage volume after numerous reported IO errors
> -----------------------------------------------------
>
> Key: HDDS-11943
> URL: https://issues.apache.org/jira/browse/HDDS-11943
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ethan Rose
> Assignee: Ashish Kumar
> Priority: Major
>
> Currently on-demand volume scanning is triggered for IO errors encountered
> while the cluster is running, but the volume can only be failed by a
> configurable number of volume scans failures.
> The volume scanner syncs a file to the disk and reads it back. This itself
> alone not catch some types of volume failures. For example, if older sectors
> of a disk that have already been written to are failing for reads, the
> container scanner will keep raising errors and marking containers unhealthy,
> but the corresponding volume scans will always write their file to new
> sectors that don't have errors.
> To fix this, we can keep a counter of how many IO errors have been reported
> from on-demand scan requests for a volume. If that number crosses a
> configurable count, we can fail the volume even if volume scans are passing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]