[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

Thomas Byrne - STFC UKRI Mon, 13 Jan 2025 07:20:16 -0800

Thanks for the input Josh. I actually started looking into this was because 
we're adding some SSD OSDs to this cluster, and they were basically as slow on 
their initial boot as HDD OSDs when the cluster hasn't trimmed OSDmaps in a 
while.


I'd be interested to know if other people seeing this slow start also see a 
relatively slow (~50MB/s) rate of OSDmap download during that initial boot, or 
if this is something specific to our cluster. I agree with the general guidance 
of 'don't let clusters get into these states', but it would be interesting to 
know if there is something to do to speed this up.

Thanks,
Tom
________________________________________
From: Joshua Baergen <jbaer...@digitalocean.com>
Sent: Friday, January 10, 2025 16:56
To: Frédéric Nass <frederic.n...@univ-lorraine.fr>
Cc: Byrne, Thomas (STFC,RAL,SC) <tom.by...@stfc.ac.uk>; Wesley Dillingham 
<w...@wesdillingham.com>; ceph-users <ceph-users@ceph.io>
Subject: Re: [ceph-users] Re: Slow initial boot of OSDs in large cluster with 
unclean state
 
> > FWIW, having encountered these long-startup issues many times in the
> > past on both HDD and QLC OSDs, I can pretty confidently say that
> > throwing flash at the problem doesn't make it go away. Fewer issues
> > with DB IOs contending with client IOs, but flapping can still occur
> > during PG cleaning on QLC, at least on Pacific.
>
> Interesting. Were these QLC OSDs collocated too? Or were they running DB on 
> QLCs and data on HDDs?

Collocated in each case (i.e. wal/db was alongside data)

Josh
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

Reply via email to