In addition: Make sure you are using kernels with the proper fixes.
CephFS is a co-operation between the MDS, OSDs and (Kernel) clients. If
the clients are outdated they can cause all kinds of troubles.
So make sure you are able to update clients to recent versions.
Although a stock CentOS or Ub
We running a small Ceph cluster with two nodes. Our failureDomain is
set to host to have the data replicated between the two hosts. The
other night one host crashed hard and three OSDs won't recovert with
either
debug 2021-01-13T08:13:17.855+ 7f9bfbd6ef40 -1 osd.23 0 OSD::init()
: unable to r
Hello,
I suspect there was unwritten data in RAM which didn't make it to the
disk. This shoudn't happen, that's why the journal is in place.
If you have size=2 in you pool, there is one copy on the other host. Do
delete the OSD you could probably do
ceph osd crush remove osd.x
ceph osd rm osd.x
Hey all
We landed in a bad place (tm) with our nvme metadata tier. I'll root cause how
we got here after it's all back up. I suspect it was a pool got misconfigured
and just filled it all up.
Short version, the OSDs are all full (or full enough) that I can't get them to
spin back up. They c
We may have found a way out of the jam. ceph-bluestore-tool's
bluefs-bdev-migrate is successfully getting data moved into another LV and then
we can manually start the OSDs to get the captive PGs out. It is not a fix I
would trust beyond getting out of jail and I completely plan on blowing awa