At cluster creation I'm seeing that the mons are taking a while time to form quorum. It seems like I'm hitting a timeout of 60s somewhere. Am I missing a config setting that would help paxos establish quorum sooner? When initializing with the monmap I would have expected the mons to initialize very quickly.
The scenario is: * Luminous RC 2 * The mons are initialized with a monmap * Running in Kubernetes (Rook) The symptoms are: * When all three mons start in parallel, they appear to determine their rank immediately. I assume this means they establish communication. A log message is seen such as this in each of the mon logs: * 2017-08-08 17:03:16.383599 7f8da7c85f40 0 mon.rook-ceph-mon1@-1(probing) e0 my rank is now 0 (was –1) * Now paxos enters a loop that times out every two seconds and lasts about 60s, trying to probe the other monitors. During this wait, I am able to curl the mon endpoints successfully. * 2017-08-08 17:03:17.345877 7f02b779af40 10 mon.rook-ceph-mon0@1(probing) e0 probing other monitors * 2017-08-08 17:03:19.346032 7f02ae568700 4 mon.rook-ceph-mon0@1(probing) e0 probe_timeout 0x55c93678bb00 * After about 60 seconds the probe succeeds and the mons start responding * 2017-08-08 17:04:17.356928 7f02ae568700 10 mon.rook-ceph-mon0@1(probing) e0 probing other monitors * 2017-08-08 17:04:17.366587 7f02a855c700 10 mon.rook-ceph-mon0@1(probing) e0 ms_verify_authorizer 10.0.0.254:6790/0 mon protocol 2 The relevant settings in the config are: mon initial members = rook-ceph-mon0 rook-ceph-mon1 rook-ceph-mon2 mon host = 10.0.0.24:6790,10.0.0.163:6790,10.0.0.139:6790 public addr = 10.0.0.24 cluster addr = 172.17.0.5 The full log for this mon at debug log level 20 can be found here: https://gist.github.com/travisn/2c2641a6b80a7479b3b22accb41a5193 Any ideas? Thanks, Travis
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com