On Wed, Nov 30, 2016 at 8:31 AM, Manuel Lausch <manuel.lau...@1und1.de> wrote:
> Yes. This parameter is used in the condition described there: > http://docs.ceph.com/docs/jewel/rados/configuration/mon- > osd-interaction/#osds-report-their-status and works. I think the default > timeout of 900s is quiet a bit large. > > Also in the documentation is a other function wich checks the health of > OSDs and report them down: http://docs.ceph.com/docs/ > jewel/rados/configuration/mon-osd-interaction/#osds-report-down-osds > > As far as I see in the sourcode this documentation is not valid anymore! > I found this commit -> https://github.com/ceph/ceph/commit/ > bcb8f362ec6ac47c4908118e7860dec7971d001f#diff- > 0a5db46a44ae9900e226289a810f10e8 > > "mon_osd_min_down_reporters" now is the threshold how many " > mon_osd_reporter_subtree_level" has to report a down OSD. in Hammer this > was how many other OSDs had to report. And in Hammer there was also the > parameter "mon_osd_min_down_reports" which sets how often a other OSD has > to report a other OSD. In Jewel the parameter doesn't exists anymore. > > With this "knowlege" I adjusted my configuration. And will now test it. > > > BTW: > While reading the source code I may found a other bug. Can you confirm > this? > In the function "OSDMonitor::check_failure" in src/mon/OSDMonitor.cc > the code which counts the "reporters_by_subtree" is in the if block "if > (g_conf->mon_osd_adjust_heartbeat_grace) {". So if I disable > adjust_heartbeat_grace the reporters_by_subtree functionality will not > work at all. > > > Yes, I think you're correct and that's a (fairly nasty, to somebody someday) bug. Can you create a ticket at tracker.ceph.com? :) -Greg
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com