Last night I blew away my previous ceph configuration (this environment is pre-production) and have 0.87.1 installed. I've manually edited the crushmap so it down looks like https://dpaste.de/OLEa
I currently have 144 OSDs on 8 nodes. After increasing pg_num and pgp_num to a more suitable 1024 (due to the high number of OSDs), everything looked happy. So, now I'm trying to play with an erasure-coded pool. I did: ceph osd erasure-code-profile set ec44profile k=4 m=4 ruleset-failure-domain=rack ceph osd pool create ec44pool 8192 8192 erasure ec44profile After settling for a bit 'ceph status' gives cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck unclean; 7 pgs stuck undersized; 7 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e409: 144 osds: 144 up, 144 in pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects 90598 MB used, 640 TB / 640 TB avail 7 active+undersized+degraded 12281 active+clean So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck' ok pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 1.d77 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:57.502849 0'0 408:12 [15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116,2147483647] 15 0'0 2015-03-04 11:33:42.100752 0'0 2015-03-04 11:33:42.100752 1.10fa 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:29.362554 0'0 408:12 [23,12,99,114,132,53,56,2147483647] 23 [23,12,99,114,132,53,56,2147483647] 23 0'0 2015-03-04 11:33:42.168571 0'0 2015-03-04 11:33:42.168571 1.1271 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:33:48.795742 0'0 408:12 [135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647,83] 135 0'0 2015-03-04 11:33:42.139555 0'0 2015-03-04 11:33:42.139555 1.2b5 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:32.189738 0'0 408:12 [11,115,139,19,76,52,94,2147483647] 11 [11,115,139,19,76,52,94,2147483647] 11 0'0 2015-03-04 11:33:42.079673 0'0 2015-03-04 11:33:42.079673 1.7ae 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:26.848344 0'0 408:12 [27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52,2147483647] 27 0'0 2015-03-04 11:33:42.109832 0'0 2015-03-04 11:33:42.109832 1.1a97 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:25.457454 0'0 408:12 [20,53,14,54,102,118,2147483647,72] 20 [20,53,14,54,102,118,2147483647,72] 20 0'0 2015-03-04 11:33:42.833850 0'0 2015-03-04 11:33:42.833850 1.10a6 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 11:34:30.059936 0'0 408:12 [136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647,72,52,101,55] 136 0'0 2015-03-04 11:33:42.125871 0'0 2015-03-04 11:33:42.125871 This appears to have a number on all these (2147483647) that is way out of line from what I would expect. Thoughts?
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com