>> I wonder that when a osd came back from power-lost, all the data
>> scrubbing and there are 2 other copies.
>> PLP is important on mostly Block Storage, Ceph should easily recover
>> from that situation.
>> That's why I don't understand why I should pay more for PLP and other
>> protections.
> 
> I'm no expert (or power user) al all, but my reasoning is: if something 
> power-related can take down one of my servers it can just as easily take down 
> *all* my ceph servers at once.
> 
> And that could just as easily render all three copies inacessible.

Or even two.  I’ve been through a protracted outage (not power related) that 
involved widespread OSD flapping.  Despite having not lost OSDs in the end, 
somehow a single RADOS object ended up lost, in an RBD head.  Very much a 
corner case, but if we’d been using 2R it would have been gruesome.

On another occasion I saw a power inductor / PSU failure take down power in an 
entire DC row.  Fortunately we were using redundant PSUs on different circuits. 
 One node went down nonetheless — the PSU on the surviving power feed had a 
previous issue that wasn’t caught because PSUs weren’t monitored.  As with 
active/passive network bonds, this showed the importance of monitoring and 
addressing latent faults so you don’t find them at exactly the wrong time.

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to