Check ceph pg query, it will (usually) tell you why something is stuck inactive.
Also: never do min_size 1. Paul 2018-05-17 15:48 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > I was able to obtain another NVMe to get the HDDs in node1004 into the > cluster. > The number of disks (all 1TB) is now balanced between racks, still some > inactive PGs: > > data: > pools: 2 pools, 1536 pgs > objects: 639k objects, 2554 GB > usage: 5167 GB used, 14133 GB / 19300 GB avail > pgs: 1.562% pgs not active > 1183/1309952 objects degraded (0.090%) > 199660/1309952 objects misplaced (15.242%) > 1072 active+clean > 405 active+remapped+backfill_wait > 35 active+remapped+backfilling > 21 activating+remapped > 3 activating+undersized+degraded+remapped > > > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 18.85289 root default > -16 18.85289 datacenter dc01 > -19 18.85289 pod dc01-agg01 > -10 8.98700 rack dc01-rack02 > -4 4.03899 host node1001 > 0 hdd 0.90999 osd.0 up 1.00000 1.00000 > 1 hdd 0.90999 osd.1 up 1.00000 1.00000 > 5 hdd 0.90999 osd.5 up 1.00000 1.00000 > 2 ssd 0.43700 osd.2 up 1.00000 1.00000 > 3 ssd 0.43700 osd.3 up 1.00000 1.00000 > 4 ssd 0.43700 osd.4 up 1.00000 1.00000 > -7 4.94899 host node1002 > 9 hdd 0.90999 osd.9 up 1.00000 1.00000 > 10 hdd 0.90999 osd.10 up 1.00000 1.00000 > 11 hdd 0.90999 osd.11 up 1.00000 1.00000 > 12 hdd 0.90999 osd.12 up 1.00000 1.00000 > 6 ssd 0.43700 osd.6 up 1.00000 1.00000 > 7 ssd 0.43700 osd.7 up 1.00000 1.00000 > 8 ssd 0.43700 osd.8 up 1.00000 1.00000 > -11 9.86589 rack dc01-rack03 > -22 5.38794 host node1003 > 17 hdd 0.90999 osd.17 up 1.00000 1.00000 > 18 hdd 0.90999 osd.18 up 1.00000 1.00000 > 24 hdd 0.90999 osd.24 up 1.00000 1.00000 > 26 hdd 0.90999 osd.26 up 1.00000 1.00000 > 13 ssd 0.43700 osd.13 up 1.00000 1.00000 > 14 ssd 0.43700 osd.14 up 1.00000 1.00000 > 15 ssd 0.43700 osd.15 up 1.00000 1.00000 > 16 ssd 0.43700 osd.16 up 1.00000 1.00000 > -25 4.47795 host node1004 > 23 hdd 0.90999 osd.23 up 1.00000 1.00000 > 25 hdd 0.90999 osd.25 up 1.00000 1.00000 > 27 hdd 0.90999 osd.27 up 1.00000 1.00000 > 19 ssd 0.43700 osd.19 up 1.00000 1.00000 > 20 ssd 0.43700 osd.20 up 1.00000 1.00000 > 21 ssd 0.43700 osd.21 up 1.00000 1.00000 > 22 ssd 0.43700 osd.22 up 1.00000 1.00000 > > > Pools are size 2, min_size 1 during setup. > > The count of PGs in activate state are related to the weight of OSDs but > why are they failing to proceed to active+clean or active+remapped? > > Kind regards, > Kevin > > 2018-05-17 14:05 GMT+02:00 Kevin Olbrich <k...@sv01.de>: > >> Ok, I just waited some time but I still got some "activating" issues: >> >> data: >> pools: 2 pools, 1536 pgs >> objects: 639k objects, 2554 GB >> usage: 5194 GB used, 11312 GB / 16506 GB avail >> pgs: 7.943% pgs not active >> 5567/1309948 objects degraded (0.425%) >> 195386/1309948 objects misplaced (14.916%) >> 1147 active+clean >> 235 active+remapped+backfill_wait >> * 107 activating+remapped* >> 32 active+remapped+backfilling >> * 15 activating+undersized+degraded+remapped* >> >> I set these settings during runtime: >> ceph tell 'osd.*' injectargs '--osd-max-backfills 16' >> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4' >> ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800' >> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32' >> >> Sure, mon_max_pg_per_osd is oversized but this is just temporary. >> Calculated PGs per OSD is 200. >> >> I searched the net and the bugtracker but most posts suggest >> osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I >> got more stuck PGs. >> >> Any more hints? >> >> Kind regards. >> Kevin >> >> 2018-05-17 13:37 GMT+02:00 Kevin Olbrich <k...@sv01.de>: >> >>> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by >>> default, will place 200 PGs on each OSD. >>> I read about the protection in the docs and later noticed that I better >>> had only placed 100 PGs. >>> >>> >>> 2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>: >>> >>>> Hi! >>>> >>>> Thanks for your quick reply. >>>> Before I read your mail, i applied the following conf to my OSDs: >>>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32' >>>> >>>> Status is now: >>>> data: >>>> pools: 2 pools, 1536 pgs >>>> objects: 639k objects, 2554 GB >>>> usage: 5211 GB used, 11295 GB / 16506 GB avail >>>> pgs: 7.943% pgs not active >>>> 5567/1309948 objects degraded (0.425%) >>>> 252327/1309948 objects misplaced (19.262%) >>>> 1030 active+clean >>>> 351 active+remapped+backfill_wait >>>> 107 activating+remapped >>>> 33 active+remapped+backfilling >>>> 15 activating+undersized+degraded+remapped >>>> >>>> A little bit better but still some non-active PGs. >>>> I will investigate your other hints! >>>> >>>> Thanks >>>> Kevin >>>> >>>> 2018-05-17 13:30 GMT+02:00 Burkhard Linke < >>>> burkhard.li...@computational.bio.uni-giessen.de>: >>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote: >>>>> >>>>>> Hi! >>>>>> >>>>>> Today I added some new OSDs (nearly doubled) to my luminous cluster. >>>>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was >>>>>> complaining about to few PGs. (I noticed that should better have been >>>>>> small >>>>>> changes). >>>>>> >>>>>> This is the current status: >>>>>> >>>>>> health: HEALTH_ERR >>>>>> 336568/1307562 objects misplaced (25.740%) >>>>>> Reduced data availability: 128 pgs inactive, 3 pgs >>>>>> peering, 1 >>>>>> pg stale >>>>>> Degraded data redundancy: 6985/1307562 objects degraded >>>>>> (0.534%), 19 pgs degraded, 19 pgs undersized >>>>>> 107 slow requests are blocked > 32 sec >>>>>> 218 stuck requests are blocked > 4096 sec >>>>>> >>>>>> data: >>>>>> pools: 2 pools, 1536 pgs >>>>>> objects: 638k objects, 2549 GB >>>>>> usage: 5210 GB used, 11295 GB / 16506 GB avail >>>>>> pgs: 0.195% pgs unknown >>>>>> 8.138% pgs not active >>>>>> 6985/1307562 objects degraded (0.534%) >>>>>> 336568/1307562 objects misplaced (25.740%) >>>>>> 855 active+clean >>>>>> 517 active+remapped+backfill_wait >>>>>> 107 activating+remapped >>>>>> 31 active+remapped+backfilling >>>>>> 15 activating+undersized+degraded+remapped >>>>>> 4 active+undersized+degraded+remapped+backfilling >>>>>> 3 unknown >>>>>> 3 peering >>>>>> 1 stale+active+clean >>>>>> >>>>> >>>>> You need to resolve the unknown/peering/activating pgs first. You have >>>>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25 >>>>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more >>>>> than 200 PGs. There's a threshold for the number of PGs; reaching this >>>>> threshold keeps the OSDs from accepting new PGs. >>>>> >>>>> Try to increase the threshold (mon_max_pg_per_osd / >>>>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about >>>>> the exact one, consult the documentation) to allow more PGs on the OSDs. >>>>> If >>>>> this is the cause of the problem, the peering and activating states should >>>>> be resolved within a short time. >>>>> >>>>> You can also check the number of PGs per OSD with 'ceph osd df'; the >>>>> last column is the current number of PGs. >>>>> >>>>> >>>>>> >>>>>> OSD tree: >>>>>> >>>>>> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT >>>>>> PRI-AFF >>>>>> -1 16.12177 root default >>>>>> -16 16.12177 datacenter dc01 >>>>>> -19 16.12177 pod dc01-agg01 >>>>>> -10 8.98700 rack dc01-rack02 >>>>>> -4 4.03899 host node1001 >>>>>> 0 hdd 0.90999 osd.0 up 1.00000 >>>>>> 1.00000 >>>>>> 1 hdd 0.90999 osd.1 up 1.00000 >>>>>> 1.00000 >>>>>> 5 hdd 0.90999 osd.5 up 1.00000 >>>>>> 1.00000 >>>>>> 2 ssd 0.43700 osd.2 up 1.00000 >>>>>> 1.00000 >>>>>> 3 ssd 0.43700 osd.3 up 1.00000 >>>>>> 1.00000 >>>>>> 4 ssd 0.43700 osd.4 up 1.00000 >>>>>> 1.00000 >>>>>> -7 4.94899 host node1002 >>>>>> 9 hdd 0.90999 osd.9 up 1.00000 >>>>>> 1.00000 >>>>>> 10 hdd 0.90999 osd.10 up 1.00000 >>>>>> 1.00000 >>>>>> 11 hdd 0.90999 osd.11 up 1.00000 >>>>>> 1.00000 >>>>>> 12 hdd 0.90999 osd.12 up 1.00000 >>>>>> 1.00000 >>>>>> 6 ssd 0.43700 osd.6 up 1.00000 >>>>>> 1.00000 >>>>>> 7 ssd 0.43700 osd.7 up 1.00000 >>>>>> 1.00000 >>>>>> 8 ssd 0.43700 osd.8 up 1.00000 >>>>>> 1.00000 >>>>>> -11 7.13477 rack dc01-rack03 >>>>>> -22 5.38678 host node1003 >>>>>> 17 hdd 0.90970 osd.17 up 1.00000 >>>>>> 1.00000 >>>>>> 18 hdd 0.90970 osd.18 up 1.00000 >>>>>> 1.00000 >>>>>> 24 hdd 0.90970 osd.24 up 1.00000 >>>>>> 1.00000 >>>>>> 26 hdd 0.90970 osd.26 up 1.00000 >>>>>> 1.00000 >>>>>> 13 ssd 0.43700 osd.13 up 1.00000 >>>>>> 1.00000 >>>>>> 14 ssd 0.43700 osd.14 up 1.00000 >>>>>> 1.00000 >>>>>> 15 ssd 0.43700 osd.15 up 1.00000 >>>>>> 1.00000 >>>>>> 16 ssd 0.43700 osd.16 up 1.00000 >>>>>> 1.00000 >>>>>> -25 1.74799 host node1004 >>>>>> 19 ssd 0.43700 osd.19 up 1.00000 >>>>>> 1.00000 >>>>>> 20 ssd 0.43700 osd.20 up 1.00000 >>>>>> 1.00000 >>>>>> 21 ssd 0.43700 osd.21 up 1.00000 >>>>>> 1.00000 >>>>>> 22 ssd 0.43700 osd.22 up 1.00000 >>>>>> 1.00000 >>>>>> >>>>>> >>>>>> Crush rule is set to chooseleaf rack and (temporary!) to size 2. >>>>>> Why are PGs stuck in peering and activating? >>>>>> "ceph df" shows that only 1,5TB are used on the pool, residing on the >>>>>> hdd's >>>>>> - which would perfectly fit the crush rule....(?) >>>>>> >>>>> >>>>> Size 2 within the crush rule or size 2 for the two pools? >>>>> >>>>> Regards, >>>>> Burkhard >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> >>> >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com