Paul Eggert wrote: > Bob Proulx wrote: > >The FSF admins have isolated and corrected the problem. The report is > >that the problem was a faulty SSD in a RAID10 set of four. It was > >returning corrupted data and reporting it as good data. That's > >exceptionally bad hardware. > > Ouch! What type of SSD it was, exactly?
They didn't say. They only said that they had three Intel SSDs and one non-Intel SSD. The Intel ones were good. (I have had great personal experience with Intel SSDs.) The bad SSD was a non-Intel model and I don't know the type. It was extremely bad that it returned corrupted data silently with no error indication. That's BAD! Since the bad drive has been removed everything seems to be working correctly and I haven't heard any more reports of data corruption. Let's hope that good fortune continues. > I don't know of any published measurements for this type of error, though SSD definitely have different failure modes from spinning media. RAID continues to be very important. As can be shown by this incident. It is likely that the new challenges of different SSD failures will require more active checksuming with SSDs than with traditional spinning media. > Google has published measures for many other types. In their experience, > flash drives have significantly more uncorrectable errors than hard disk > drives, with different failure characteristics for MLC vs SLC vs eMLC in the > field. See: > > Schroeder B, Lagisetty R, Merchant A. Flash reliability in production: the > expected and the unexpected. FAST'16, 2016-02-22, 67-80. > https://usenix.org/conference/fast16/technical-sessions/presentation/schroeder > > > For more details about flash drives problems after cycling power, see: > > Zheng M, Tucek J, Qin F, Lillibridge M. Understanding the robustness of SSDs > under power fault. FAST'13, 2013-02-12, 271-84. > https://www.usenix.org/conference/fast13/technical-sessions/presentation/zheng I will pass that information along to the FSF admins. Bob