Re: [ceph-users] How does monitor know OSD is dead?

Robert LeBlanc Fri, 28 Jun 2019 09:13:09 -0700

I'm not sure why the monitor did not mark it down after 600 seconds
(default). The reason it is so long is that you don't want to move data
around unnecessarily if the osd is just being rebooted/restarted. Usually,
you will still have min_size OSDs available for all PGs that will allow IO
to continue. Then when the down timeout expires it will start backfilling
and recovering the PGs that were affected. Double check that size !=
min_size for your pools.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1



On Thu, Jun 27, 2019 at 5:26 PM Bryan Henderson <bry...@giraffe-data.com>
wrote:

> What does it take for a monitor to consider an OSD down which has been
> dead as
> a doornail since the cluster started?
>
> A couple of times, I have seen 'ceph status' report an OSD was up, when it
> was
> quite dead.  Recently, a couple of OSDs were on machines that failed to
> boot
> up after a power failure.  The rest of the Ceph cluster came up, though,
> and
> reported all OSDs up and in.  I/Os stalled, probably because they were
> waiting
> for the dead OSDs to come back.
>
> I waited 15 minutes, because the manual says if the monitor doesn't hear a
> heartbeat from an OSD in that long (default value of
> mon_osd_report_timeout),
> it marks it down.  But it didn't.  I did "osd down" commands for the dead
> OSDs
> and the status changed to down and I/O started working.
>
> And wouldn't even 15 minutes of grace be unacceptable if it means I/Os
> have to
> wait that long before falling back to a redundant OSD?
>
> --
> Bryan Henderson                                   San Jose, California
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How does monitor know OSD is dead?

Reply via email to