Re: Detecting disk/volume failures in Ozone

2023-06-21 Thread Ethan Rose
I totally agree. We need write checksums on by default, and I am not sure the historical reason they were turned off when they were added in HDDS-5623 . We should at least test to quantify the performance difference of on vs off before we flip the sw

Re: Detecting disk/volume failures in Ozone

2023-06-21 Thread Stephen O'Donnell
Why is the write checksums validation not turned on by default? I have seen cases on HDFS where the "verify checksums on write" feature caught data corruption problems caused by faulty hardware / network cables before it was able to propagate into the system. The only reason I can think of for not

Re: Detecting disk/volume failures in Ozone

2023-06-20 Thread Ethan Rose
Hi Uma, The datanode side checksums on write are still turned off. IO Exceptions on the read/write path will trigger on demand container and volume/disk scans. We could add containers to the on demand scanning queue after they are closed for an initial scan, but this may place unnecessary burden o

Re: Detecting disk/volume failures in Ozone

2023-06-20 Thread Uma Maheswara Rao Gangumalla
Thank you Ethan for working on this important work. Looks like we are not enabled by default to validate the data checksums when writing. I am just thinking that we should validate data with priorities in background scanning?. Example: For files which did not get scanned before should be prioritize