Re: [ceph-users] New EC pool undersized

Kyle Hutson Wed, 04 Mar 2015 12:46:36 -0800

It wouldn't let me simply change the pg_num, giving
Error EEXIST: specified pg_num 2048 <= current 8192


But that's not a big deal, I just deleted the pool and recreated with 'ceph
osd pool create ec44pool 2048 2048 erasure ec44profile'
...and the result is quite similar: 'ceph status' is now
ceph status
    cluster 196e5eb8-d6a7-4435-907e-ea028e946923
     health HEALTH_WARN 4 pgs degraded; 4 pgs stuck unclean; 4 pgs
undersized
     monmap e1: 4 mons at {hobbit01=
10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0},
election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
     osdmap e412: 144 osds: 144 up, 144 in
      pgmap v6798: 6144 pgs, 2 pools, 0 bytes data, 0 objects
            90590 MB used, 640 TB / 640 TB avail
                   4 active+undersized+degraded
                6140 active+clean

'ceph pg dump_stuck results' in
ok
pg_stat objects mip degr misp unf bytes log disklog state state_stamp v
reported up up_primary acting acting_primary last_scrub scrub_stamp
last_deep_scrub deep_scrub_stamp
2.296 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.672224
0'0 412:9 [5,55,91,2147483647,83,135,53,26] 5
[5,55,91,2147483647,83,135,53,26] 5 0'0 2015-03-04 14:33:15.649911 0'0
2015-03-04
14:33:15.649911
2.69c 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:24.984802
0'0 412:9 [93,134,1,74,112,28,2147483647,60] 93
[93,134,1,74,112,28,2147483647,60] 93 0'0 2015-03-04 14:33:15.695747
0'0 2015-03-04
14:33:15.695747
2.36d 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:21.937620
0'0 412:9 [12,108,136,104,52,18,63,2147483647] 12
[12,108,136,104,52,18,63,2147483647] 12 0'0 2015-03-04 14:33:15.652480
0'0 2015-03-04
14:33:15.652480
2.5f7 0 0 0 0 0 0 0 0 active+undersized+degraded 2015-03-04 14:33:26.169242
0'0 412:9 [94,128,73,22,4,60,2147483647,113] 94
[94,128,73,22,4,60,2147483647,113] 94 0'0 2015-03-04 14:33:15.687695
0'0 2015-03-04
14:33:15.687695

I do have questions for you, even at this point, though.
1) Where did you find the formula (14400/(k+m))?
2) I was really trying to size this for when it goes to production, at
which point it may have as many as 384 OSDs. Doesn't that imply I should
have even more pgs?

On Wed, Mar 4, 2015 at 2:15 PM, Don Doerner <don.doer...@quantum.com> wrote:

>  Oh duh…  OK, then given a 4+4 erasure coding scheme, 14400/8 is 1800, so
> try 2048.
>
>
>
> -don-
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Don Doerner
> *Sent:* 04 March, 2015 12:14
> *To:* Kyle Hutson; Ceph Users
> *Subject:* Re: [ceph-users] New EC pool undersized
>
>
>
> In this case, that number means that there is not an OSD that can be
> assigned.  What’s your k, m from you erasure coded pool?  You’ll need
> approximately (14400/(k+m)) PGs, rounded up to the next power of 2…
>
>
>
> -don-
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com
> <ceph-users-boun...@lists.ceph.com>] *On Behalf Of *Kyle Hutson
> *Sent:* 04 March, 2015 12:06
> *To:* Ceph Users
> *Subject:* [ceph-users] New EC pool undersized
>
>
>
> Last night I blew away my previous ceph configuration (this environment is
> pre-production) and have 0.87.1 installed. I've manually edited the
> crushmap so it down looks like https://dpaste.de/OLEa
> <https://urldefense.proofpoint.com/v1/url?u=https://dpaste.de/OLEa&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=c1bd46dcd96e656554817882d7f6581903b1e3c6a50313f4bf7494acfd12b442>
>
>
>
> I currently have 144 OSDs on 8 nodes.
>
>
>
> After increasing pg_num and pgp_num to a more suitable 1024 (due to the
> high number of OSDs), everything looked happy.
>
> So, now I'm trying to play with an erasure-coded pool.
>
> I did:
>
> ceph osd erasure-code-profile set ec44profile k=4 m=4
> ruleset-failure-domain=rack
>
> ceph osd pool create ec44pool 8192 8192 erasure ec44profile
>
>
>
> After settling for a bit 'ceph status' gives
>
>     cluster 196e5eb8-d6a7-4435-907e-ea028e946923
>
>      health HEALTH_WARN 7 pgs degraded; 7 pgs stuck degraded; 7 pgs stuck
> unclean; 7 pgs stuck undersized; 7 pgs undersized
>
>      monmap e1: 4 mons at {hobbit01=
> 10.5.38.1:6789/0,hobbit02=10.5.38.2:6789/0,hobbit13=10.5.38.13:6789/0,hobbit14=10.5.38.14:6789/0
> <https://urldefense.proofpoint.com/v1/url?u=http://10.5.38.1:6789/0%2Chobbit02%3D10.5.38.2:6789/0%2Chobbit13%3D10.5.38.13:6789/0%2Chobbit14%3D10.5.38.14:6789/0&k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0A&r=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0A&m=JSfAuDHRgKln0yM%2FQGMT3hZb3rVLUpdn2wGdV3C0Rbk%3D%0A&s=6fe07b47a00235857630057e09cfb702dcddcea1d3f98d81a574020ee95dee44>},
> election epoch 6, quorum 0,1,2,3 hobbit01,hobbit02,hobbit13,hobbit14
>
>      osdmap e409: 144 osds: 144 up, 144 in
>
>       pgmap v6763: 12288 pgs, 2 pools, 0 bytes data, 0 objects
>
>             90598 MB used, 640 TB / 640 TB avail
>
>                    7 active+undersized+degraded
>
>                12281 active+clean
>
>
>
> So to troubleshoot the undersized pgs, I issued 'ceph pg dump_stuck'
>
> ok
>
> pg_stat   objects   mip  degr misp unf  bytes     log  disklog     state
> state_stamp    v    reported  up   up_primary     acting    acting_primary
> last_scrub     scrub_stamp     last_deep_scrub     deep_scrub_stamp
>
> 1.d77     0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:33:57.502849     0'0  408:12
> [15,95,58,73,52,31,116,2147483647] 15     [15,95,58,73,52,31,116,
> 2147483647] 15   0'0  2015-03-04 11:33:42.100752     0'0  2015-03-04
> 11:33:42.100752
>
> 1.10fa    0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:34:29.362554     0'0  408:12
> [23,12,99,114,132,53,56,2147483647]     23   [23,12,99,114,132,53,56,
> 2147483647]     23   0'0  2015-03-04 11:33:42.168571    0'0  2015-03-04
> 11:33:42.168571
>
> 1.1271    0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:33:48.795742     0'0  408:12
> [135,112,69,4,22,95,2147483647,83] 135     [135,112,69,4,22,95,2147483647
> ,83] 135  0'0  2015-03-04 11:33:42.139555     0'0  2015-03-04
> 11:33:42.139555
>
> 1.2b5     0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:34:32.189738     0'0  408:12
> [11,115,139,19,76,52,94,2147483647]     11   [11,115,139,19,76,52,94,
> 2147483647]     11   0'0  2015-03-04 11:33:42.079673    0'0  2015-03-04
> 11:33:42.079673
>
> 1.7ae     0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:34:26.848344     0'0  408:12
> [27,5,132,119,94,56,52,2147483647] 27     [27,5,132,119,94,56,52,
> 2147483647] 27   0'0  2015-03-04 11:33:42.109832     0'0  2015-03-04
> 11:33:42.109832
>
> 1.1a97    0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:34:25.457454     0'0  408:12
> [20,53,14,54,102,118,2147483647,72]     20   [20,53,14,54,102,118,
> 2147483647,72]     20   0'0  2015-03-04 11:33:42.833850    0'0  2015-03-04
> 11:33:42.833850
>
> 1.10a6    0    0    0    0    0    0    0    0
> active+undersized+degraded    2015-03-04 11:34:30.059936     0'0  408:12
> [136,22,4,2147483647,72,52,101,55] 136     [136,22,4,2147483647
> ,72,52,101,55] 136  0'0  2015-03-04 11:33:42.125871     0'0  2015-03-04
> 11:33:42.125871
>
>
>
> This appears to have a number on all these (2147483647) that is way out
> of line from what I would expect.
>
>
>
> Thoughts?
>
>
>   ------------------------------
>
> The information contained in this transmission may be confidential. Any
> disclosure, copying, or further distribution of confidential information is
> not permitted unless such privilege is explicitly granted in writing by
> Quantum. Quantum reserves the right to have electronic communications,
> including email and attachments, sent across its networks filtered through
> anti virus and spam software programs and retain such messages in order to
> comply with applicable data security and retention requirements. Quantum is
> not responsible for the proper and complete transmission of the substance
> of this communication or for any delay in its receipt.
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New EC pool undersized

Reply via email to