Hi all, We sometimes can observe that acting set seems to violate crush rule. For example, we had an environment before:
[root@Ann-per-R7-3 /]# ceph -s cluster: id: 248ce880-f57b-4a4c-a53a-3fc2b3eb142a health: HEALTH_WARN 34/8019 objects misplaced (0.424%) services: mon: 3 daemons, quorum Ann-per-R7-3,Ann-per-R7-7,Ann-per-R7-1 mgr: Ann-per-R7-3(active), standbys: Ann-per-R7-7, Ann-per-R7-1 mds: cephfs-1/1/1 up {0=qceph-mds-Ann-per-R7-1=up:active}, 2 up:standby osd: 7 osds: 7 up, 7 in; 1 remapped pgs data: pools: 7 pools, 128 pgs objects: 2.67 k objects, 10 GiB usage: 107 GiB used, 3.1 TiB / 3.2 TiB avail pgs: 34/8019 objects misplaced (0.424%) 127 active+clean 1 active+clean+remapped [root@Ann-per-R7-3 /]# ceph pg ls remapped PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES LOG STATE STATE_STAMP VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 1.7 34 0 34 0 134217728 42 active+clean+remapped 2019-11-05 10:39:58.639533 144'42 229:407 [6,1]p6 [6,1,2]p6 2019-11-04 10:36:19.519820 2019-11-04 10:36:19.519820 [root@Ann-per-R7-3 /]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -2 0 root perf_osd -1 3.10864 root default -7 0.44409 host Ann-per-R7-1 5 hdd 0.44409 osd.5 up 1.00000 1.00000 -3 1.33228 host Ann-per-R7-3 0 hdd 0.44409 osd.0 up 1.00000 1.00000 1 hdd 0.44409 osd.1 up 1.00000 1.00000 2 hdd 0.44409 osd.2 up 1.00000 1.00000 -9 1.33228 host Ann-per-R7-7 6 hdd 0.44409 osd.6 up 1.00000 1.00000 7 hdd 0.44409 osd.7 up 1.00000 1.00000 8 hdd 0.44409 osd.8 up 1.00000 1.00000 [root@Ann-per-R7-3 /]# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 5 hdd 0.44409 1.00000 465 GiB 21 GiB 444 GiB 4.49 1.36 127 0 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.16 0.96 44 1 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.14 0.95 52 2 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.98 0.91 33 6 hdd 0.44409 1.00000 465 GiB 14 GiB 451 GiB 2.97 0.90 43 7 hdd 0.44409 1.00000 465 GiB 15 GiB 450 GiB 3.19 0.97 41 8 hdd 0.44409 1.00000 465 GiB 14 GiB 450 GiB 3.09 0.94 44 TOTAL 3.2 TiB 107 GiB 3.1 TiB 3.29 MIN/MAX VAR: 0.90/1.36 STDDEV: 0.49 Based on our crush map, crush rule should select 1 OSD from each host. However, from above log, we can see that an acting set is [6,1,2] and osd.1 and osd.2 are in the same host, which seems to violate crush rule. So, my question is how does this happen...? Any enlightenment is much appreciated. Best Cian
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com