sodonnel commented on PR #8405: URL: https://github.com/apache/ozone/pull/8405#issuecomment-2857905118
My observation from past problems on HDFS is that partially failed disks are a very large problem. They are hard to detect and sometimes reads on them can block for a very long time, resulting in hard to explain slow reads. I'd be more in favor of failing bad volumes completely, rather than I understand the idea this doc is going with, but it does add quite a bit of new complexity to the system: * The new degraded state and DN excluding it for writes * DN capacity reduction perhaps? Balancer has to factor this in. * SCM tracking container to volume mappings * The replication flow considering the new state * Probably a need for SCM to tell clients to try the degraded volume last The system is intended to handle the abrupt loss of a datanode or disk at any time, so what is driving the need for this proposal? Are volumes being failed too easily resulting in dataloss? If volumes are being failed to eagerly, then for what reason? Disk full, checksum errors, outright failed reads? We do have mechanisms to repair bad containers already (scanner and reconcilor), so that part is handled. What is considered an IO error which can trigger an ondemand scan? Is it a checksum validation or an unexpect EOF / data length error? Are we keeping a sliding windown count of each unique block so that 10 failures on the same block only counts as 1 rather than 10? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org