OK, now my ceph cluster is died & re-created. Main problem was too many pgs and
disabled swap, then one of node have problems with xfs (even stuck on mount) and
all starts to die, last on trying to edit pgs & delete more then needed. But I
see some issues.
After ceph-osd crash (out of RAM there) some of PGs (backfilled?) broken (and
even replicate it?). This PG wait (locked) forever and also produce "slow
requests" (living forever too). Some example from logs:
2015-11-01 18:06:52.920712 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[0.123( v
162977'16156 (155381'13156,162977'16156] local-les=189415 n=552 ec=1 les/c
189415/189476 189377/189377/189377) [0,10,3] r=2 lpr=189377 pi=163821-189376/285
luod=0'0 crt=0'0 lcod 0'0 active] lock
2015-11-01 18:06:52.922113 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[2.121( v
162977'1918456 (160044'1915456,162977'1918456] local-les=189415 n=340 ec=1 les/c
189415/189476 189377/189377/189377) [0,10,3] r=2 lpr=189377 pi=163821-189376/277
luod=0'0 crt=0'0 lcod 0'0 active] lock
2015-11-01 18:06:52.924800 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[1.124( v
162378'21481 (160046'18440,162378'21481] local-les=189468 n=69 ec=1 les/c
189468/189481 189447/189463/189302) [10,12,3] r=2 lpr=189463 pi=178724-189462/54
luod=0'0 crt=162378'21479 lcod 0'0 active] lock
2015-11-01 18:06:52.925660 7fc5f7654700 30 osd.3 pg_epoch: 189581 pg[0.125( v
162981'15001 (154638'12001,162981'15001] local-les=189477 n=551 ec=1 les/c
189477/189484 189447/189464/189302) [10,12,3] r=2 lpr=189464 pi=178724-189463/66
luod=0'0 crt=0'0 lcod 0'0 active] lock
So, IMHO need to improve crash recovery (especially on backfilling) and pg
verification after restart, at least to avoid "active" for broken pg.
PS 0.94.5
PPS 4.3.0 not stuck on mount, but xfs_repair still required.
PPPS Use swap and avoid forced kill.
--
WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com