The documentation here: http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ <http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/>
says "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 seconds" and " If a neighboring Ceph OSD Daemon doesn’t show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may consider the neighboring Ceph OSD Daemon down and report it back to a Ceph Monitor," I've always thought that each OSD heartbeats with *every* other OSD, which of course means that total heartbeat traffic grows ~ quadratically. However in extending test we've observed that the number of other OSDs that a subject heartbeat (heartbeated?) was < N, which has us wondering if perhaps only OSDs with which a given OSD shares are contacted -- or some other subset. I plan to submit a doc fix for mon_osd_min_down_reporters and wanted to resolve this FUD first. -- aad
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com