[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Anthony D'Atri Wed, 23 Apr 2025 06:38:58 -0700


> The failed SSD disks seem to be quite dead unfortunately, not visible to the 
> OS and also marked as dead in the node iDRAC BMC.


I’ve found that iDRAC’s view of SSDs is sometimes … imperfect, but not visible 
to the OS is telling.

If you could send me

        storcli64 /c0 show termlog >/var/tmp/termlog.txt      # or perccli64
        storcli64 /c0 show all

I’d love to take a look and see if the HBA has any additional information.

One possible though unlikely scenario is that the lost drives had a firmware 
flaw but surviving drives had a newer revision



> We haven't tried moving them to a different node to test though, I can try 
> that.
> 
> In this power event we lost all of the SSD devices on 2 out of 3 OSD nodes in 
> the cluster (it was a small testing cluster) and half of them on the 3rd OSD 
> node.
> 
> So the vast majority of OSDs can't start here and the overall cluster state 
> is extremely degraded.
> 
> So if there is state contained within the old, dead DB devices that can't be 
> directly replaced with the instantiation of new replacement DB devices, then 
> it's looking like we've just lost too many DB devices in one foul swoop to 
> ever recover this Ceph cluster, despite the OSD HDDs all being 
> clean+untouched by the power event.
> 
> I had been hoping that the DB state was more ephemeral than it seems to be, 
> and so instantiation of new DB devices mapped to the correct OSD devices (via 
> LUKS key) would allow for restarting the down+out OSD devices. But that's 
> increasingly looking to not be possible, from updates on this thread.
> 
> *******************
> Paul Browne
> Research Computing Platforms
> University Information Services
> Roger Needham Building
> JJ Thompson Avenue
> University of Cambridge
> Cambridge
> United Kingdom
> E-Mail: pf...@cam.ac.uk<mailto:pf...@cam.ac.uk>
> Tel: 0044-1223-746548
> *******************
> ________________________________
> From: Frédéric Nass <frederic.n...@univ-lorraine.fr>
> Sent: 23 April 2025 11:24
> To: Paul Browne <pf...@cam.ac.uk>
> Cc: ceph-users <ceph-users@ceph.io>
> Subject: Re: [ceph-users] Cluster recovery: DC power failure killed OSD node 
> BlueStore block.DB devices
> 
> Hi Paul,
> 
> Could you provide more details about the 'SSD BlueStore block.DB devices 
> dead' issue?
> 
> Are these devices not seen or seen as defective at the hardware level 
> (through iLO, iDrac, etc.)? Or are they visible to the operating system but 
> their associated OSDs are failing to start?
> If you can't bring these RocksDB devices back online, associated OSDs will be 
> permanently dead.
> 
> Regards,
> Frédéric.
> 
> ----- Le 22 Avr 25, à 23:11, Paul Browne pf...@cam.ac.uk a écrit :
> 
>> Hi ceph-users,
>> 
>> We recently suffered a total power failure at our main DC; fortunately, our
>> production Ceph cluster emerged unscathed but a smaller Ceph cluster came 
>> back
>> with the majority of its dedicated SSD BlueStore block.DB devices dead (but 
>> its
>> HDD OSD devices unharmed). This cluster underpinned a small OpenStack cloud, 
>> so
>> it would be preferable to recover it rather than writing it off.
>> 
>> In terms of deployment tooling, this ailing Ceph cluster is a fairly standard
>> Red Hat Ceph Storage 7 (so Quincy) cephadm deployed cluster, with the main
>> wrinkle about it being that both the DB and HDD OSD devices make use of the
>> cephadm supported LVM->LUKS layering above the BlueStore devices.
>> 
>> The dead BlueStore block.DB devices are of course blocking the surviving HDD 
>> OSD
>> daemons (in cephadm deployed containers) from coming up cleanly and so the 
>> Ceph
>> cluster status is currently very degraded (attached status for the ugly
>> picture)
>> 
>> I've kicked around some ideas of recovering the dead DB devices and 
>> restarting
>> down+out OSDs by;
>> 
>>   * Manually partitioning replacement SSDs into new DB device partitions+LVs
>>    * Installing the same LUKS keys on them retrieved from the Ceph config DB,
>>    matching up against which OSD is on which OSD host.
>> 
>> *
>> Manually changing over device links for OSDs to their DB device with
>> "ceph-bluestore-tool bluefs-bdev-new-db" or similar
>> *
>> Try restarting OSDs with updated links to new LVM->LUKS->block.DB devices
>> 
>> This approach seems highly messy and subject to needing to extract a lot of
>> information error-free from dumps of ceph volume lvm, list in order to 
>> exactly
>> match extant OSD UUIDs to newly created DB devicemapper devices.
>> 
>> Is there going to be some smarter/better/faster way to non-destructively 
>> recover
>> these intact HDD OSDs which have links to dead block.DB devices, using native
>> cephadm tooling rather than getting so low-level as all the above?
>> 
>> Many thanks for any advice,
>> 
>> *******************
>> Paul Browne
>> Research Computing Platforms
>> University Information Services
>> Roger Needham Building
>> JJ Thompson Avenue
>> University of Cambridge
>> Cambridge
>> United Kingdom
>> E-Mail: pf...@cam.ac.uk<mailto:pf...@cam.ac.uk>
>> Tel: 0044-1223-746548
>> *******************
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Reply via email to