[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Frédéric Nass Wed, 23 Apr 2025 07:10:57 -0700

Hi Paul,

I'm continuing this thread here, following Anthony's insightful remarks and 
offer to help.


I find it hard to believe that enterprise-grade NVMe drives would fail during a 
power outage, unless there's an issue with the NVMe or HBA firmware. I 
recommend opening a support case with DELL, HPE, or whatever manufacturer made 
your server.

Before doing that, try these troubleshooting steps:

- Shut down the server completely
- Disconnect all power cables for at least 10 minutes
- Restart the server (this might resolve temporary discovery issues during boot)

If the drives reappear during startup, you may need to 'import' them in the 
boot process. Watch for a message on the console prompting you to do this.
If these steps don't help, try upgrading all firmware on the server.

I've seen 'dead' DELL SSDs (Toshiba) come back to life after a firmware 
upgrade, even when marked as dead in iDrac. See [1] for details.

Ultimately, your best course of action is to open a support case with your 
hardware manufacturer.

Regards,
Frédéric.

[1] https://www.spinics.net/lists/ceph-users/msg78647.html

----- Le 23 Avr 25, à 15:37, Anthony D'Atri anthony.da...@gmail.com a écrit :

>> The failed SSD disks seem to be quite dead unfortunately, not visible to the 
>> OS
>> and also marked as dead in the node iDRAC BMC.
> 
> I’ve found that iDRAC’s view of SSDs is sometimes … imperfect, but not visible
> to the OS is telling.
> 
> If you could send me
> 
>       storcli64 /c0 show termlog >/var/tmp/termlog.txt      # or perccli64
>       storcli64 /c0 show all
> 
> I’d love to take a look and see if the HBA has any additional information.
> 
> One possible though unlikely scenario is that the lost drives had a firmware
> flaw but surviving drives had a newer revision
> 
> 
> 
>> We haven't tried moving them to a different node to test though, I can try 
>> that.
>> 
>> In this power event we lost all of the SSD devices on 2 out of 3 OSD nodes in
>> the cluster (it was a small testing cluster) and half of them on the 3rd OSD
>> node.
>> 
>> So the vast majority of OSDs can't start here and the overall cluster state 
>> is
>> extremely degraded.
>> 
>> So if there is state contained within the old, dead DB devices that can't be
>> directly replaced with the instantiation of new replacement DB devices, then
>> it's looking like we've just lost too many DB devices in one foul swoop to 
>> ever
>> recover this Ceph cluster, despite the OSD HDDs all being clean+untouched by
>> the power event.
>> 
>> I had been hoping that the DB state was more ephemeral than it seems to be, 
>> and
>> so instantiation of new DB devices mapped to the correct OSD devices (via 
>> LUKS
>> key) would allow for restarting the down+out OSD devices. But that's
>> increasingly looking to not be possible, from updates on this thread.
>> 
>> *******************
>> Paul Browne
>> Research Computing Platforms
>> University Information Services
>> Roger Needham Building
>> JJ Thompson Avenue
>> University of Cambridge
>> Cambridge
>> United Kingdom
>> E-Mail: pf...@cam.ac.uk<mailto:pf...@cam.ac.uk>
>> Tel: 0044-1223-746548
>> *******************
>> ________________________________
>> From: Frédéric Nass <frederic.n...@univ-lorraine.fr>
>> Sent: 23 April 2025 11:24
>> To: Paul Browne <pf...@cam.ac.uk>
>> Cc: ceph-users <ceph-users@ceph.io>
>> Subject: Re: [ceph-users] Cluster recovery: DC power failure killed OSD node
>> BlueStore block.DB devices
>> 
>> Hi Paul,
>> 
>> Could you provide more details about the 'SSD BlueStore block.DB devices 
>> dead'
>> issue?
>> 
>> Are these devices not seen or seen as defective at the hardware level 
>> (through
>> iLO, iDrac, etc.)? Or are they visible to the operating system but their
>> associated OSDs are failing to start?
>> If you can't bring these RocksDB devices back online, associated OSDs will be
>> permanently dead.
>> 
>> Regards,
>> Frédéric.
>> 
>> ----- Le 22 Avr 25, à 23:11, Paul Browne pf...@cam.ac.uk a écrit :
>> 
>>> Hi ceph-users,
>>> 
>>> We recently suffered a total power failure at our main DC; fortunately, our
>>> production Ceph cluster emerged unscathed but a smaller Ceph cluster came 
>>> back
>>> with the majority of its dedicated SSD BlueStore block.DB devices dead (but 
>>> its
>>> HDD OSD devices unharmed). This cluster underpinned a small OpenStack 
>>> cloud, so
>>> it would be preferable to recover it rather than writing it off.
>>> 
>>> In terms of deployment tooling, this ailing Ceph cluster is a fairly 
>>> standard
>>> Red Hat Ceph Storage 7 (so Quincy) cephadm deployed cluster, with the main
>>> wrinkle about it being that both the DB and HDD OSD devices make use of the
>>> cephadm supported LVM->LUKS layering above the BlueStore devices.
>>> 
>>> The dead BlueStore block.DB devices are of course blocking the surviving 
>>> HDD OSD
>>> daemons (in cephadm deployed containers) from coming up cleanly and so the 
>>> Ceph
>>> cluster status is currently very degraded (attached status for the ugly
>>> picture)
>>> 
>>> I've kicked around some ideas of recovering the dead DB devices and 
>>> restarting
>>> down+out OSDs by;
>>> 
>>>   * Manually partitioning replacement SSDs into new DB device partitions+LVs
>>>    * Installing the same LUKS keys on them retrieved from the Ceph config 
>>> DB,
>>>    matching up against which OSD is on which OSD host.
>>> 
>>> *
>>> Manually changing over device links for OSDs to their DB device with
>>> "ceph-bluestore-tool bluefs-bdev-new-db" or similar
>>> *
>>> Try restarting OSDs with updated links to new LVM->LUKS->block.DB devices
>>> 
>>> This approach seems highly messy and subject to needing to extract a lot of
>>> information error-free from dumps of ceph volume lvm, list in order to 
>>> exactly
>>> match extant OSD UUIDs to newly created DB devicemapper devices.
>>> 
>>> Is there going to be some smarter/better/faster way to non-destructively 
>>> recover
>>> these intact HDD OSDs which have links to dead block.DB devices, using 
>>> native
>>> cephadm tooling rather than getting so low-level as all the above?
>>> 
>>> Many thanks for any advice,
>>> 
>>> *******************
>>> Paul Browne
>>> Research Computing Platforms
>>> University Information Services
>>> Roger Needham Building
>>> JJ Thompson Avenue
>>> University of Cambridge
>>> Cambridge
>>> United Kingdom
>>> E-Mail: pf...@cam.ac.uk<mailto:pf...@cam.ac.uk>
>>> Tel: 0044-1223-746548
>>> *******************
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Reply via email to