Guys, After the update to 0.94.7 (from 0.94.6) everytime I replaced a broken OSD (1 out of 300) I get flooded by "[WRN] failed to encode map eXXX with expected crc", and the amount of blocked requests (> 32 secs) increase drastically, consequently killing all radosgw sessions.
Nothing changed in our cluster expect the version update, and before that, we never had any issues like that, the cluster was able to handle disks replacement quite well. The procedure used for the OSD replacement is the following: Removing the dead disk: ceph osd out <id> ceph osd crush remove osd.<id> —> here the problem starts ceph osd rm <id> ceph auth del osd.<id> Adding a new OSD: ceph-deploy disk zap <node>:/dev/<disk> ceph-deploy --overwrite-conf osd prepare <node>:<disk>:/dev/<journal partition> ceph-deploy --overwrite-conf osd activate <node>:<disk>:/dev/<journal partition> Warning flood messages: cluster xxx health HEALTH_WARN 97 pgs backfill 12 pgs backfilling 3 pgs peering 2 pgs stuck inactive 112 pgs stuck unclean 242 requests are blocked > 32 sec recovery 111320/18148458 objects misplaced (0.613%) monmap e1: 3 mons at {mon001=xxx:6789/0,mon002=xxx:6789/0,mon003=xxx:6789/0} election epoch 526, quorum 0,1,2 mon001,mon002,mon003 osdmap e134086: 296 osds: 296 up, 296 in; 108 remapped pgs pgmap v12721457: 18368 pgs, 15 pools, 17163 GB data, 5889 kobjects 55811 GB used, 397 TB / 451 TB avail 111320/18148458 objects misplaced (0.613%) 18254 active+clean 97 active+remapped+wait_backfill 12 active+remapped+backfilling 3 peering 1 active+clean+scrubbing+deep 1 active+clean+scrubbing recovery io 42311 kB/s, 15 objects/s client io 5205 B/s rd, 6 op/s 2016-06-02 11:22:56.319615 osd.43 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.320236 osd.21 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.320862 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.322256 osd.21 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.322833 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.324521 osd.21 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.324533 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.326382 osd.21 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.326716 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.328460 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.328500 osd.21 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.330503 osd.60 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.330517 osd.43 [WRN] failed to encode map e134066 with expected crc 2016-06-02 11:22:56.330671 osd.21 [WRN] failed to encode map e134066 with expected crc Kind regards, Romero Junior DevOps Infra Engineer LeaseWeb Global Services B.V. T: +31 20 316 0230 M: +31 6 2115 9310 E: r.jun...@global.leaseweb.com W: www.leaseweb.com<http://www.leaseweb.com> Luttenbergweg 8, 1101 EC Amsterdam, Netherlands LeaseWeb is the brand name under which the various independent LeaseWeb companies operate. Each company is a separate and distinct entity that provides services in a particular geographic area. LeaseWeb Global Services B.V. does not provide third-party services. Please see www.leaseweb.com/en/legal for more information.
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com