What osd/mon/etc config settings do you have that are not default? It might
be worth utilizing nodown to stop osds from marking each other down and
finish the upgrade to be able to set the minimum osd version to mimic. Stop
the osds in a node, manually mark them down, start them back up in mimic.
Depending on how bad things are, setting pause on the cluster to just
finish the upgrade faster might not be a bad idea either.

This should be a simple question, have you confirmed that there are no
networking problems between the MONs while the elections are happening?

On Sat, Sep 8, 2018, 7:52 PM Kevin Hrpcek <kevin.hrp...@ssec.wisc.edu>
wrote:

> Hey Sage,
>
> I've posted the file with my email address for the user. It is with
> debug_mon 20/20, debug_paxos 20/20, and debug ms 1/5. The mons are calling
> for elections about every minute so I let this run for a few elections and
> saw this node become the leader a couple times. Debug logs start around
> 23:27:30. I had managed to get about 850/857 osds up, but it seems that
> within the last 30 min it has all gone bad again due to the OSDs reporting
> each other as failed. We relaxed the osd_heartbeat_interval to 30 and
> osd_heartbeat_grace to 60 in an attempt to slow down how quickly OSDs are
> trying to fail each other. I'll put in the rocksdb_cache_size setting.
>
> Thanks for taking a look.
>
> Kevin
>
> On 09/08/2018 06:04 PM, Sage Weil wrote:
>
> Hi Kevin,
>
> I can't think of any major luminous->mimic changes off the top of my head
> that would impact CPU usage, but it's always possible there is something
> subtle.  Can you ceph-post-file a the full log from one of your mons
> (preferbably the leader)?
>
> You might try adjusting the rocksdb cache size.. try setting
>
>  rocksdb_cache_size = 1342177280   # 10x the default, ~1.3 GB
>
> on the mons and restarting?
>
> Thanks!
> sage
>
> On Sat, 8 Sep 2018, Kevin Hrpcek wrote:
>
>
> Hello,
>
> I've had a Luminous -> Mimic upgrade go very poorly and my cluster is stuck
> with almost all pgs down. One problem is that the mons have started to
> re-elect a new quorum leader almost every minute. This is making it difficult
> to monitor the cluster and even run any commands on it since at least half the
> time a ceph command times out or takes over a minute to return results. I've
> looked at the debug logs and it appears there is some timeout occurring with
> paxos of about a minute. The msg_dispatch thread of the mons is often running
> a core at 100% for about a minute(user time, no iowait). Running strace on it
> shows the process is going through all of the mon db files (about 6gb in
> store.db/*.sst). Does anyone have an idea of what this timeout is or why my
> mons are always reelecting? One theory I have is that the msg_dispatch can't
> process the SST's fast enough and hits some timeout for a health check and the
> mon drops itself from the quorum since it thinks it isn't healthy. I've been
> thinking of introducing a new mon to the cluster on hardware with a better cpu
> to see if that can process the SSTs within this timeout.
>
> My cluster has the mons,mds,mgr and 30/41 osd servers on mimic, and 11/41 osd
> servers on luminous. The original problem started when I restarted the osds on
> one of the hosts. The cluster reacted poorly to them going down and went into
> a frenzy of taking down other osds and remapping. I eventually got that stable
> and the PGs were 90% good with the finish line in sight and then the mons
> started their issue of releecting every minute. Now I can't keep any decent
> amount of PGs up for more than a few hours. This started on Wednesday.....
>
> Any help would be greatly appreciated.
>
> Thanks,
> Kevin
>
> --Debug snippet from a mon at reelection time
> 2018-09-07 20:08:08.655 7f57b92cd700 20 mon.sephmon2@1(leader).mds e14242
> maybe_resize_cluster in 1 max 1
> 2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mds e14242
> tick: resetting beacon timeouts due to mon delay (slow election?) of 59.8106s
> seconds
> 2018-09-07 20:08:08.655 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(mdsmap 13504..14242) maybe_trim trim_to
> 13742 would only trim 238 < paxos_service_trim_min 250
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> auth
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> check_rotate updated rotating
> 2018-09-07 20:08:08.655 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(auth 120594..120657) propose_pending
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).auth v120657
> encode_pending v 120658
> 2018-09-07 20:08:08.655 7f57b92cd700  5 mon.sephmon2@1(leader).paxos(paxos
> updating c 132917556..132918214) queue_pending_finisher 0x55dce8e5b370
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).paxos(paxos
> updating c 132917556..132918214) trigger_propose not active, will propose
> later
> 2018-09-07 20:08:08.655 7f57b92cd700  4 mon.sephmon2@1(leader).mgr e2234 tick:
> resetting beacon timeouts due to mon delay (slow election?) of 59.8844s
> seconds
> 2018-09-07 20:08:08.655 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(mgr 1513..2234) maybe_trim trim_to 1734
> would only trim 221 < paxos_service_trim_min 250
> 2018-09-07 20:08:08.655 7f57b92cd700 10 mon.sephmon2@1(leader).health tick
> 2018-09-07 20:08:08.655 7f57b92cd700 20 mon.sephmon2@1(leader).health
> check_member_health
> 2018-09-07 20:08:08.657 7f57bcdd0700  1 -- 10.1.9.202:6789/0 >> -
> conn(0x55dcee55be00 :6789 s=STATE_ACCEPTING pgs=0 cs=0
> l=0)._process_connection sd=447 -
> 2018-09-07 20:08:08.657 7f57bcdd0700 10 mon.sephmon2@1(leader) e17
> ms_verify_authorizer 10.1.9.32:6823/4007 osd protocol 0
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader).health
> check_member_health avail 79% total 40 GiB, used 8.4 GiB, avail 32 GiB
> 2018-09-07 20:08:08.662 7f57b92cd700 20 mon.sephmon2@1(leader).health
> check_leader_health
> 2018-09-07 20:08:08.662 7f57b92cd700 10
> mon.sephmon2@1(leader).paxosservice(health 1534..1720) maybe_trim trim_to 1715
> would only trim 181 < paxos_service_trim_min 250
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader).config tick
> 2018-09-07 20:08:08.662 7f57b92cd700 20 mon.sephmon2@1(leader) e17
> sync_trim_providers
> 2018-09-07 20:08:08.662 7f57b92cd700 -1 mon.sephmon2@1(leader) e17
> get_health_metrics reporting 1940 slow ops, oldest is osd_failure(failed
> timeout osd.72 10.1.9.9:6800/68904 for 317sec e987498 v987498)
> 2018-09-07 20:08:08.662 7f57b92cd700  1 mon.sephmon2@1(leader).paxos(paxos
> updating c 132917556..132918214) accept timeout, calling fresh election
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17 bootstrap
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> sync_reset_requester
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> unregister_cluster_logger
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(leader) e17
> cancel_probe_timeout (none scheduled)
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing) e17 _reset
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing) e17
> cancel_probe_timeout (none scheduled)
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing) e17
> timecheck_finish
> 2018-09-07 20:08:08.662 7f57b92cd700 15 mon.sephmon2@1(probing) e17
> health_tick_stop
> 2018-09-07 20:08:08.662 7f57b92cd700 15 mon.sephmon2@1(probing) e17
> health_interval_stop
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing) e17
> scrub_event_cancel
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing) e17
> scrub_reset
> 2018-09-07 20:08:08.662 7f57b92cd700 10 mon.sephmon2@1(probing).paxos(paxos
> updating c 132917556..132918214) restart -- canceling timeouts
> 2018-09-07 20:08:08.662 7f57b92cd700 10
> mon.sephmon2@1(probing).paxosservice(mdsmap 13504..14242) restart
> 2018-09-07 20:08:08.662 7f57b92cd700 10
> mon.sephmon2@1(probing).paxosservice(osdmap 917471..987503) restart
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to