Hi, Cephers! I'm currently testing the situation of double failure for ceph cluster. But, I faced that pgs are in stale state forever.
reproduce steps) 0. ceph version : jewel 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) 1. Pool create : exp-volumes (size = 2, min_size = 1) 2. rbd create : testvol01 3. rbd map and create mkfs.xfs 4. mount and create file 5. list rados object 6. check osd map of each object # ceph osd map exp-volumes rbd_data.4a41f238e1f29.000000000000017a osdmap e199 pool 'exp-volumes' (2) object 'rbd_data.4a41f238e1f29.000000000000017a' -> pg 2.3f04d6e2 (2.62) -> up ([2,6], p2) acting ([2,6], p2) 7. stop primary osd.2 and secondary osd.6 of above object at the same time 8. check ceph status health HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds 16 pgs stale 16 pgs stuck stale monmap e11: 3 mons at {10.105.176.85=10.105.176.85: 6789/0,10.110.248.154=10.110.248.154:6789/0,10.110.249.153= 10.110.249.153:6789/0} election epoch 84, quorum 0,1,2 10.105.176.85,10.110.248.154, 10.110.249.153 osdmap e248: 6 osds: 4 up, 4 in; 16 remapped pgs flags sortbitwise,require_jewel_osds pgmap v112095: 128 pgs, 1 pools, 14659 kB data, 17 objects 165 MB used, 159 GB / 160 GB avail 112 active+clean 16 stale+active+clean # ceph health detail HEALTH_ERR 16 pgs are stuck inactive for more than 300 seconds; 16 pgs stale; 16 pgs stuck stale pg 2.67 is stuck stale for 689.171742, current state stale+active+clean, last acting [2,6] pg 2.5a is stuck stale for 689.171748, current state stale+active+clean, last acting [6,2] pg 2.52 is stuck stale for 689.171753, current state stale+active+clean, last acting [2,6] pg 2.4d is stuck stale for 689.171757, current state stale+active+clean, last acting [2,6] pg 2.56 is stuck stale for 689.171755, current state stale+active+clean, last acting [6,2] pg 2.d is stuck stale for 689.171811, current state stale+active+clean, last acting [6,2] pg 2.79 is stuck stale for 689.171808, current state stale+active+clean, last acting [2,6] pg 2.1f is stuck stale for 689.171782, current state stale+active+clean, last acting [6,2] pg 2.76 is stuck stale for 689.171809, current state stale+active+clean, last acting [6,2] pg 2.17 is stuck stale for 689.171794, current state stale+active+clean, last acting [6,2] pg 2.63 is stuck stale for 689.171794, current state stale+active+clean, last acting [2,6] pg 2.77 is stuck stale for 689.171816, current state stale+active+clean, last acting [2,6] pg 2.1b is stuck stale for 689.171793, current state stale+active+clean, last acting [6,2] pg 2.62 is stuck stale for 689.171765, current state stale+active+clean, last acting [2,6] pg 2.30 is stuck stale for 689.171799, current state stale+active+clean, last acting [2,6] pg 2.19 is stuck stale for 689.171798, current state stale+active+clean, last acting [6,2] # ceph pg dump_stuck stale ok pg_stat state up up_primary acting acting_primary 2.67 stale+active+clean [2,6] 2 [2,6] 2 2.5a stale+active+clean [6,2] 6 [6,2] 6 2.52 stale+active+clean [2,6] 2 [2,6] 2 2.4d stale+active+clean [2,6] 2 [2,6] 2 2.56 stale+active+clean [6,2] 6 [6,2] 6 2.d stale+active+clean [6,2] 6 [6,2] 6 2.79 stale+active+clean [2,6] 2 [2,6] 2 2.1f stale+active+clean [6,2] 6 [6,2] 6 2.76 stale+active+clean [6,2] 6 [6,2] 6 2.17 stale+active+clean [6,2] 6 [6,2] 6 2.63 stale+active+clean [2,6] 2 [2,6] 2 2.77 stale+active+clean [2,6] 2 [2,6] 2 2.1b stale+active+clean [6,2] 6 [6,2] 6 2.62 stale+active+clean [2,6] 2 [2,6] 2 2.30 stale+active+clean [2,6] 2 [2,6] 2 2.19 stale+active+clean [6,2] 6 [6,2] 6 # ceph pg 2.62 query Error ENOENT: i don't have pgid 2.62 # rados ls -p exp-volumes rbd_data.4a41f238e1f29.000000000000003f ^C --> hang I understand that this is a natural result becasue above pgs have no primary and seconary osd. But this situation can be occurred so, I want to recover ceph cluster and rbd images. Firstly I want to know how to make ceph cluster's state clean. I read document and try to solve this but nothing can help including below commands. - ceph pg force_create_pg 2.6 - ceph osd lost 2 --yes-i-really-mean-it - ceph osd lost 6 --yes-i-really-mean-it - ceph osd crush rm osd.2 - ceph osd crush rm osd.6 - cpeh osd rm osd.2 - ceph osd rm osd.6 Is there any command to force delete pgs or make ceph cluster clean ? Thank you in advance.
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com