It wouldn't let me simply change the pg_num, giving Error EEXIST: specified pg_num 2048 <= current 8192
But that's not a big deal, I just deleted the pool and recreated with 'ceph osd pool create ec44pool 2048 2048 erasure ec44profile' ...and the result is quite similar: 'ceph status' is now ceph status cluster 196e5eb8-d6a7-4435-907e-ea028e946923 health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs undersized monmap e1: 4 mons at {hobbit01= 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0}, election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 osdmap e412: 144 osds: 144 up, 144 in pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects 90590 MB used, 640 TB / 640 TB avail 4 active+undersized+degraded 6140 active+clean 'ceph pg dump_stuck results' in ok pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.296 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.672224 0'0 412:9 [5,55,91,2147483647,83,135,53,26] 5 [5,55,91,2147483647,83,135,53,26] 5 0'0 2015-03-04 14:33:15.649911 0'0 2015-03-04 14:33:15.649911 2.69c 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:24.984802 0'0 412:9 [93,134,1,74,112,28,2147483647,60] 93 [93,134,1,74,112,28,2147483647,60] 93 0'0 2015-03-04 14:33:15.695747 0'0 2015-03-04 14:33:15.695747 2.36d 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:21.937620 0'0 412:9 [12,108,136,104,52,18,63,2147483647] 12 [12,108,136,104,52,18,63,2147483647] 12 0'0 2015-03-04 14:33:15.652480 0'0 2015-03-04 14:33:15.652480 2.5f7 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.169242 0'0 412:9 [94,128,73,22,4,60,2147483647,113] 94 [94,128,73,22,4,60,2147483647,113] 94 0'0 2015-03-04 14:33:15.687695 0'0 2015-03-04 14:33:15.687695 I do have questions for you, even at this point, though. 1) Where did you find the formula (14400/(k+m))? 2) I was really trying to size this for when it goes to production, at which point it may have as many as 384 OSDs. Doesn't that imply I should have even more pgs? On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner <don.doer...@quantum.com> wrote: > Oh duh… OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so > try 2048. > > > > -don- > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf > Of *Don Doerner > *Sent:* 04 March, 2015 12:14 > *To:* Kyle Hutson; Ceph Users > *Subject:* Re: [ceph-users] New EC pool undersized > > > > In this case, that number means that there is not an OSD that can be > assigned. What’s your k, m from you erasure coded pool? You’ll need > approximately (14400/(k+m)) PGs, rounded up to the next power of 2… > > > > -don- > > > > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com > <ceph-users-boun...@lists.ceph.com>] *On Behalf Of *Kyle Hutson > *Sent:* 04 March, 2015 12:06 > *To:* Ceph Users > *Subject:* [ceph-users] New EC pool undersized > > > > Last night I blew away my previous ceph configuration (this environment is > pre-production) and have 0.87.1 installed. I've manually edited the > crushmap so it down looks like https://dpaste.de/OLEa > <https://urldefense.proofpoint.com/v1/url?u=https://dpaste.de/OLEa&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=c1bd46dcd96e656554817882d7f6581903b1e3c6a50313f4bf7494acfd12b442> > > > > I currently have 144 OSDs on 8 nodes. > > > > After increasing pg_num and pgp_num to a more suitable 1024 (due to the > high number of OSDs), everything looked happy. > > So, now I'm trying to play with an erasure-coded pool. > > I did: > > ceph osd erasure-code-profile set ec44profile k=4 m=4 > ruleset-failure-domain=rack > > ceph osd pool create ec44pool 8192 8192 erasure ec44profile > > > > After settling for a bit 'ceph status' gives > > cluster 196e5eb8-d6a7-4435-907e-ea028e946923 > > health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck > unclean; 7 pgs stuck undersized; 7 pgs undersized > > monmap e1: 4 mons at {hobbit01= > 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0 > <https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=6fe07b47a00235857630057e09cfb702dcddcea1d3f98d81a574020ee95dee44>}, > election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14 > > osdmap e409: 144 osds: 144 up, 144 in > > pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects > > 90598 MB used, 640 TB / 640 TB avail > > 7 active+undersized+degraded > > 12281 active+clean > > > > So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck' > > ok > > pg_stat objects mip degr misp unf bytes log disklog state > state_stamp v reported up up_primary acting acting_primary > last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp > > 1.d77 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:33:57.502849 0'0 408:12 > [15,95,58,73,52,31,116,2147483647] 15 [15,95,58,73,52,31,116, > 2147483647] 15 0'0 2015-03-04 11:33:42.100752 0'0 2015-03-04 > 11:33:42.100752 > > 1.10fa 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:34:29.362554 0'0 408:12 > [23,12,99,114,132,53,56,2147483647] 23 [23,12,99,114,132,53,56, > 2147483647] 23 0'0 2015-03-04 11:33:42.168571 0'0 2015-03-04 > 11:33:42.168571 > > 1.1271 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:33:48.795742 0'0 408:12 > [135,112,69,4,22,95,2147483647,83] 135 [135,112,69,4,22,95,2147483647 > ,83] 135 0'0 2015-03-04 11:33:42.139555 0'0 2015-03-04 > 11:33:42.139555 > > 1.2b5 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:34:32.189738 0'0 408:12 > [11,115,139,19,76,52,94,2147483647] 11 [11,115,139,19,76,52,94, > 2147483647] 11 0'0 2015-03-04 11:33:42.079673 0'0 2015-03-04 > 11:33:42.079673 > > 1.7ae 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:34:26.848344 0'0 408:12 > [27,5,132,119,94,56,52,2147483647] 27 [27,5,132,119,94,56,52, > 2147483647] 27 0'0 2015-03-04 11:33:42.109832 0'0 2015-03-04 > 11:33:42.109832 > > 1.1a97 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:34:25.457454 0'0 408:12 > [20,53,14,54,102,118,2147483647,72] 20 [20,53,14,54,102,118, > 2147483647,72] 20 0'0 2015-03-04 11:33:42.833850 0'0 2015-03-04 > 11:33:42.833850 > > 1.10a6 0 0 0 0 0 0 0 0 > active+undersized+degraded 2015-03-04 11:34:30.059936 0'0 408:12 > [136,22,4,2147483647,72,52,101,55] 136 [136,22,4,2147483647 > ,72,52,101,55] 136 0'0 2015-03-04 11:33:42.125871 0'0 2015-03-04 > 11:33:42.125871 > > > > This appears to have a number on all these (2147483647) that is way out > of line from what I would expect. > > > > Thoughts? > > > ------------------------------ > > The information contained in this transmission may be confidential. Any > disclosure, copying, or further distribution of confidential information is > not permitted unless such privilege is explicitly granted in writing by > Quantum. Quantum reserves the right to have electronic communications, > including email and attachments, sent across its networks filtered through > anti virus and spam software programs and retain such messages in order to > comply with applicable data security and retention requirements. Quantum is > not responsible for the proper and complete transmission of the substance > of this communication or for any delay in its receipt. >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com