This should only happen while upgrading. I can't remember the reason
why but there's a fsck (for stat repair maybe?) happening on the first
boot after upgrade.
There should be a message in the OSD log about it.


On Mon, Feb 14, 2022 at 1:31 PM Trey Palmer <> wrote:
> Hi all,
> I'm trying to upgrade some clusters from luminous to nautilus 14.2.22 (I
> know, I know!).
> It's taking about 16-18 minutes for each HDD OSD to connect into the
> cluster after the upgrade, but it only takes a minute or two for the SSD
> OSD's to connect.
> The cluster is dockerized using the standard ceph/daemon stable containers,
> and I'm using a simple ansible playbook to start the OSD dockers.
> The cluster has 42 OSD nodes and each node has 12 x 14TB disks and 2 x
> 3.8TB SSD's.  Each SSD is partitioned into 6 block.db devices and one OSD,
> and the SSD pool is used for RGW metadata and indexes.
> I have of course upgraded the 5 mon/mgr nodes beforehand.
> The nodes are Debian Stretch, which might be suboptimal but that's what my
> shop uses.
> The cluster is still receiving writes, and with these disks down for 18
> minutes, we end up with so many degraded objects that I have to wait an
> hour or two to do the next node.  The primary RGW data pool is 3+2 EC so I
> expect that recovery is a little slower than it would be in a replicated
> pool.
> Under Luminous they were only taking a few minutes to connect.
> Any ideas what could be happening here?
> Thanks,
> Trey Palmer
> _______________________________________________
> ceph-users mailing list --
> To unsubscribe send an email to
ceph-users mailing list --
To unsubscribe send an email to

Reply via email to