Re: [ceph-users] OSDs cannot match up with fast OSD map changes (epochs) during recovery

Muthusamy Muthiah Mon, 13 Feb 2017 03:57:55 -0800

Hi All,

We also have same issue on one of our platforms which was upgraded from
11.0.2 to 11.2.0 . The issue occurs on one node alone where CPU hits 100%
and OSDs of that node marked down. Issue not seen on cluster which was
installed from scratch with 11.2.0.






















*[r...@cn3.c7.vna ~] # systemctl start ceph-osd@315.service
<ceph-osd@315.service> [r...@cn3.c7.vna ~] # cd /var/log/ceph/
[r...@cn3.c7.vna ceph] # tail -f *osd*315.log 2017-02-13 11:29:46.752897
7f995c79b940  0 <cls>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/11.2.0/rpm/el7/BUILD/ceph-11.2.0/src/cls/hello/cls_hello.cc:296:
loading cls_hello 2017-02-13 11:29:46.753065 7f995c79b940  0 _get_class not
permitted to load kvs 2017-02-13 11:29:46.757571 7f995c79b940  0 _get_class
not permitted to load lua 2017-02-13 11:29:47.058720 7f995c79b940  0
osd.315 44703 crush map has features 288514119978713088, adjusting msgr
requires for clients 2017-02-13 11:29:47.058728 7f995c79b940  0 osd.315
44703 crush map has features 288514394856620032 was 8705, adjusting msgr
requires for mons 2017-02-13 11:29:47.058732 7f995c79b940  0 osd.315 44703
crush map has features 288531987042664448, adjusting msgr requires for osds
2017-02-13 11:29:48.343979 7f995c79b940  0 osd.315 44703 load_pgs
2017-02-13 11:29:55.913550 7f995c79b940  0 osd.315 44703 load_pgs opened
130 pgs 2017-02-13 11:29:55.913604 7f995c79b940  0 osd.315 44703 using 1 op
queue with priority op cut off at 64. 2017-02-13 11:29:55.914102
7f995c79b940 -1 osd.315 44703 log_to_monitors {default=true} 2017-02-13
11:30:19.384897 7f9939bbb700  1 heartbeat_map reset_timeout 'tp_osd thread
tp_osd' had timed out after 15 2017-02-13 11:30:31.073336 7f9955a2b700  1
heartbeat_map is_healthy 'tp_osd thread tp_osd' had timed out after 15
2017-02-13 11:30:31.073343 7f9955a2b700  1 heartbeat_map is_healthy 'tp_osd
thread tp_osd' had timed out after 15 2017-02-13 11:30:31.073344
7f9955a2b700  1 heartbeat_map is_healthy 'tp_osd thread tp_osd' had timed
out after 15 2017-02-13 11:30:31.073345 7f9955a2b700  1 heartbeat_map
is_healthy 'tp_osd thread tp_osd' had timed out after 15 2017-02-13
11:30:31.073347 7f9955a2b700  1 heartbeat_map is_healthy 'tp_osd thread
tp_osd' had timed out after 15 2017-02-13 11:30:31.073348 7f9955a2b700  1
heartbeat_map is_healthy 'tp_osd thread tp_osd' had timed out after
152017-02-13 11:30:54.772516 7f995c79b940  0 osd.315 44703 done with init,
starting boot process*


*Thanks,*
*Muthu*

On 13 February 2017 at 10:50, Andreas Gerstmayr <andreas.gerstm...@gmail.com
> wrote:

> Hi,
>
> Due to a faulty upgrade from Jewel 10.2.0 to Kraken 11.2.0 our test
> cluster is unhealthy since about two weeks and can't recover itself
> anymore (unfortunately I skipped the upgrade to 10.2.5 because I
> missed the ".z" in "All clusters must first be upgraded to Jewel
> 10.2.z").
>
> Immediately after the upgrade I saw the following in the OSD logs:
> s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0 l=0).fault with nothing
> to send and in the half  accept state just closed
>
> There are also missed heartbeats in the OSD logs, and the OSDs which
> don't send heartbeats have the following in their logs:
> 2017-02-08 19:44:51.367828 7f9be8c37700  1 heartbeat_map is_healthy
> 'tp_osd thread tp_osd' had timed out after 15
> 2017-02-08 19:44:54.271010 7f9bc4e96700  1 heartbeat_map reset_timeout
> 'tp_osd thread tp_osd' had timed out after 15
>
> During investigating we found out that some OSDs were lagging about
> 100-20000 OSD map epochs behind. The monitor publishes new epochs
> every few seconds, but the OSD daemons are pretty slow in applying
> them (up to a few minutes for 100 epochs). During recovery of the 24
> OSDs of a storage node the CPU is running at almost 100% (the nodes
> have 16 real cores, or 32 with Hyper-Threading).
>
> We had at times servers where all 24 OSDs were up-to-date with the
> latest OSD map, but somehow they lost it and were lagging behind
> again. During recovery some OSDs used up to 25 GB of RAM, which led to
> out of memory and further lagging of the OSDs of the affected server.
>
> We already set the nodown, noout, norebalance, nobackfill, norecover,
> noscrub and nodeep-scrub flags to prevent OSD flapping and even more
> new OSD epochs.
>
> Is there anything we can do to let the OSDs recover? It seems that the
> servers don't have enough CPU resources for recovery. I already played
> around with the osd map message max setting (when I increased it to
> 1000 to speed up recovery, the OSDs didn't get any updates at all?),
> and the osd heartbeat grace and osd thread timeout settings (to give
> the overloaded server more time), but without success so far. I've
> seen errors related to the AsyncMessenger in the logs, so I reverted
> back to the SimpleMessenger (which was working successfully with
> Jewel).
>
>
> Cluster details:
> 6 storage nodes with 2x Intel Xeon E5-2630 v3 8x2.40GHz
> 256GB RAM
> Each storage node has 24 HDDs attached, one OSD per disk, journal on same
> disk
> 3 monitors in total, co-located with the storage nodes
> separate front and back network (10 Gbit)
> OS: CentOS 7.2.1511
> Kernel: 4.9.8-1.el7.elrepo.x86_64 from elrepo.org
>
>
> Thanks,
> Andreas
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs cannot match up with fast OSD map changes (epochs) during recovery

Reply via email to