Check ceph pg query, it will (usually) tell you why something is stuck
inactive.

Also: never do min_size 1.


Paul


2018-05-17 15:48 GMT+02:00 Kevin Olbrich <k...@sv01.de>:

> I was able to obtain another NVMe to get the HDDs in node1004 into the
> cluster.
> The number of disks (all 1TB) is now balanced between racks, still some
> inactive PGs:
>
>   data:
>     pools:   2 pools, 1536 pgs
>     objects: 639k objects, 2554 GB
>     usage:   5167 GB used, 14133 GB / 19300 GB avail
>     pgs:     1.562% pgs not active
>              1183/1309952 objects degraded (0.090%)
>              199660/1309952 objects misplaced (15.242%)
>              1072 active+clean
>              405  active+remapped+backfill_wait
>              35   active+remapped+backfilling
>              21   activating+remapped
>              3    activating+undersized+degraded+remapped
>
>
>
> ID  CLASS WEIGHT   TYPE NAME                     STATUS REWEIGHT PRI-AFF
>  -1       18.85289 root default
> -16       18.85289     datacenter dc01
> -19       18.85289         pod dc01-agg01
> -10        8.98700             rack dc01-rack02
>  -4        4.03899                 host node1001
>   0   hdd  0.90999                     osd.0         up  1.00000 1.00000
>   1   hdd  0.90999                     osd.1         up  1.00000 1.00000
>   5   hdd  0.90999                     osd.5         up  1.00000 1.00000
>   2   ssd  0.43700                     osd.2         up  1.00000 1.00000
>   3   ssd  0.43700                     osd.3         up  1.00000 1.00000
>   4   ssd  0.43700                     osd.4         up  1.00000 1.00000
>  -7        4.94899                 host node1002
>   9   hdd  0.90999                     osd.9         up  1.00000 1.00000
>  10   hdd  0.90999                     osd.10        up  1.00000 1.00000
>  11   hdd  0.90999                     osd.11        up  1.00000 1.00000
>  12   hdd  0.90999                     osd.12        up  1.00000 1.00000
>   6   ssd  0.43700                     osd.6         up  1.00000 1.00000
>   7   ssd  0.43700                     osd.7         up  1.00000 1.00000
>   8   ssd  0.43700                     osd.8         up  1.00000 1.00000
> -11        9.86589             rack dc01-rack03
> -22        5.38794                 host node1003
>  17   hdd  0.90999                     osd.17        up  1.00000 1.00000
>  18   hdd  0.90999                     osd.18        up  1.00000 1.00000
>  24   hdd  0.90999                     osd.24        up  1.00000 1.00000
>  26   hdd  0.90999                     osd.26        up  1.00000 1.00000
>  13   ssd  0.43700                     osd.13        up  1.00000 1.00000
>  14   ssd  0.43700                     osd.14        up  1.00000 1.00000
>  15   ssd  0.43700                     osd.15        up  1.00000 1.00000
>  16   ssd  0.43700                     osd.16        up  1.00000 1.00000
> -25        4.47795                 host node1004
>  23   hdd  0.90999                     osd.23        up  1.00000 1.00000
>  25   hdd  0.90999                     osd.25        up  1.00000 1.00000
>  27   hdd  0.90999                     osd.27        up  1.00000 1.00000
>  19   ssd  0.43700                     osd.19        up  1.00000 1.00000
>  20   ssd  0.43700                     osd.20        up  1.00000 1.00000
>  21   ssd  0.43700                     osd.21        up  1.00000 1.00000
>  22   ssd  0.43700                     osd.22        up  1.00000 1.00000
>
>
> Pools are size 2, min_size 1 during setup.
>
> The count of PGs in activate state are related to the weight of OSDs but
> why are they failing to proceed to active+clean or active+remapped?
>
> Kind regards,
> Kevin
>
> 2018-05-17 14:05 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>
>> Ok, I just waited some time but I still got some "activating" issues:
>>
>>   data:
>>     pools:   2 pools, 1536 pgs
>>     objects: 639k objects, 2554 GB
>>     usage:   5194 GB used, 11312 GB / 16506 GB avail
>>     pgs:     7.943% pgs not active
>>              5567/1309948 objects degraded (0.425%)
>>              195386/1309948 objects misplaced (14.916%)
>>              1147 active+clean
>>              235  active+remapped+backfill_wait
>> *             107  activating+remapped*
>>              32   active+remapped+backfilling
>> *             15   activating+undersized+degraded+remapped*
>>
>> I set these settings during runtime:
>> ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
>> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'
>> ceph tell 'mon.*' injectargs '--mon_max_pg_per_osd 800'
>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>>
>> Sure, mon_max_pg_per_osd is oversized but this is just temporary.
>> Calculated PGs per OSD is 200.
>>
>> I searched the net and the bugtracker but most posts suggest
>> osd_max_pg_per_osd_hard_ratio = 32 to fix this issue but this time, I
>> got more stuck PGs.
>>
>> Any more hints?
>>
>> Kind regards.
>> Kevin
>>
>> 2018-05-17 13:37 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>>
>>> PS: Cluster currently is size 2, I used PGCalc on Ceph website which, by
>>> default, will place 200 PGs on each OSD.
>>> I read about the protection in the docs and later noticed that I better
>>> had only placed 100 PGs.
>>>
>>>
>>> 2018-05-17 13:35 GMT+02:00 Kevin Olbrich <k...@sv01.de>:
>>>
>>>> Hi!
>>>>
>>>> Thanks for your quick reply.
>>>> Before I read your mail, i applied the following conf to my OSDs:
>>>> ceph tell 'osd.*' injectargs '--osd_max_pg_per_osd_hard_ratio 32'
>>>>
>>>> Status is now:
>>>>   data:
>>>>     pools:   2 pools, 1536 pgs
>>>>     objects: 639k objects, 2554 GB
>>>>     usage:   5211 GB used, 11295 GB / 16506 GB avail
>>>>     pgs:     7.943% pgs not active
>>>>              5567/1309948 objects degraded (0.425%)
>>>>              252327/1309948 objects misplaced (19.262%)
>>>>              1030 active+clean
>>>>              351  active+remapped+backfill_wait
>>>>              107  activating+remapped
>>>>              33   active+remapped+backfilling
>>>>              15   activating+undersized+degraded+remapped
>>>>
>>>> A little bit better but still some non-active PGs.
>>>> I will investigate your other hints!
>>>>
>>>> Thanks
>>>> Kevin
>>>>
>>>> 2018-05-17 13:30 GMT+02:00 Burkhard Linke <
>>>> burkhard.li...@computational.bio.uni-giessen.de>:
>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> On 05/17/2018 01:09 PM, Kevin Olbrich wrote:
>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> Today I added some new OSDs (nearly doubled) to my luminous cluster.
>>>>>> I then changed pg(p)_num from 256 to 1024 for that pool because it was
>>>>>> complaining about to few PGs. (I noticed that should better have been
>>>>>> small
>>>>>> changes).
>>>>>>
>>>>>> This is the current status:
>>>>>>
>>>>>>      health: HEALTH_ERR
>>>>>>              336568/1307562 objects misplaced (25.740%)
>>>>>>              Reduced data availability: 128 pgs inactive, 3 pgs
>>>>>> peering, 1
>>>>>> pg stale
>>>>>>              Degraded data redundancy: 6985/1307562 objects degraded
>>>>>> (0.534%), 19 pgs degraded, 19 pgs undersized
>>>>>>              107 slow requests are blocked > 32 sec
>>>>>>              218 stuck requests are blocked > 4096 sec
>>>>>>
>>>>>>    data:
>>>>>>      pools:   2 pools, 1536 pgs
>>>>>>      objects: 638k objects, 2549 GB
>>>>>>      usage:   5210 GB used, 11295 GB / 16506 GB avail
>>>>>>      pgs:     0.195% pgs unknown
>>>>>>               8.138% pgs not active
>>>>>>               6985/1307562 objects degraded (0.534%)
>>>>>>               336568/1307562 objects misplaced (25.740%)
>>>>>>               855 active+clean
>>>>>>               517 active+remapped+backfill_wait
>>>>>>               107 activating+remapped
>>>>>>               31  active+remapped+backfilling
>>>>>>               15  activating+undersized+degraded+remapped
>>>>>>               4   active+undersized+degraded+remapped+backfilling
>>>>>>               3   unknown
>>>>>>               3   peering
>>>>>>               1   stale+active+clean
>>>>>>
>>>>>
>>>>> You need to resolve the unknown/peering/activating pgs first. You have
>>>>> 1536 PGs, assuming replication size 3 this make 4608 PG copies. Given 25
>>>>> OSDs and the heterogenous host sizes, I assume that some OSDs hold more
>>>>> than 200 PGs. There's a threshold for the number of PGs; reaching this
>>>>> threshold keeps the OSDs from accepting new PGs.
>>>>>
>>>>> Try to increase the threshold  (mon_max_pg_per_osd /
>>>>> max_pg_per_osd_hard_ratio / osd_max_pg_per_osd_hard_ratio, not sure about
>>>>> the exact one, consult the documentation) to allow more PGs on the OSDs. 
>>>>> If
>>>>> this is the cause of the problem, the peering and activating states should
>>>>> be resolved within a short time.
>>>>>
>>>>> You can also check the number of PGs per OSD with 'ceph osd df'; the
>>>>> last column is the current number of PGs.
>>>>>
>>>>>
>>>>>>
>>>>>> OSD tree:
>>>>>>
>>>>>> ID  CLASS WEIGHT   TYPE NAME                     STATUS REWEIGHT
>>>>>> PRI-AFF
>>>>>>   -1       16.12177 root default
>>>>>> -16       16.12177     datacenter dc01
>>>>>> -19       16.12177         pod dc01-agg01
>>>>>> -10        8.98700             rack dc01-rack02
>>>>>>   -4        4.03899                 host node1001
>>>>>>    0   hdd  0.90999                     osd.0         up  1.00000
>>>>>> 1.00000
>>>>>>    1   hdd  0.90999                     osd.1         up  1.00000
>>>>>> 1.00000
>>>>>>    5   hdd  0.90999                     osd.5         up  1.00000
>>>>>> 1.00000
>>>>>>    2   ssd  0.43700                     osd.2         up  1.00000
>>>>>> 1.00000
>>>>>>    3   ssd  0.43700                     osd.3         up  1.00000
>>>>>> 1.00000
>>>>>>    4   ssd  0.43700                     osd.4         up  1.00000
>>>>>> 1.00000
>>>>>>   -7        4.94899                 host node1002
>>>>>>    9   hdd  0.90999                     osd.9         up  1.00000
>>>>>> 1.00000
>>>>>>   10   hdd  0.90999                     osd.10        up  1.00000
>>>>>> 1.00000
>>>>>>   11   hdd  0.90999                     osd.11        up  1.00000
>>>>>> 1.00000
>>>>>>   12   hdd  0.90999                     osd.12        up  1.00000
>>>>>> 1.00000
>>>>>>    6   ssd  0.43700                     osd.6         up  1.00000
>>>>>> 1.00000
>>>>>>    7   ssd  0.43700                     osd.7         up  1.00000
>>>>>> 1.00000
>>>>>>    8   ssd  0.43700                     osd.8         up  1.00000
>>>>>> 1.00000
>>>>>> -11        7.13477             rack dc01-rack03
>>>>>> -22        5.38678                 host node1003
>>>>>>   17   hdd  0.90970                     osd.17        up  1.00000
>>>>>> 1.00000
>>>>>>   18   hdd  0.90970                     osd.18        up  1.00000
>>>>>> 1.00000
>>>>>>   24   hdd  0.90970                     osd.24        up  1.00000
>>>>>> 1.00000
>>>>>>   26   hdd  0.90970                     osd.26        up  1.00000
>>>>>> 1.00000
>>>>>>   13   ssd  0.43700                     osd.13        up  1.00000
>>>>>> 1.00000
>>>>>>   14   ssd  0.43700                     osd.14        up  1.00000
>>>>>> 1.00000
>>>>>>   15   ssd  0.43700                     osd.15        up  1.00000
>>>>>> 1.00000
>>>>>>   16   ssd  0.43700                     osd.16        up  1.00000
>>>>>> 1.00000
>>>>>> -25        1.74799                 host node1004
>>>>>>   19   ssd  0.43700                     osd.19        up  1.00000
>>>>>> 1.00000
>>>>>>   20   ssd  0.43700                     osd.20        up  1.00000
>>>>>> 1.00000
>>>>>>   21   ssd  0.43700                     osd.21        up  1.00000
>>>>>> 1.00000
>>>>>>   22   ssd  0.43700                     osd.22        up  1.00000
>>>>>> 1.00000
>>>>>>
>>>>>>
>>>>>> Crush rule is set to chooseleaf rack and (temporary!) to size 2.
>>>>>> Why are PGs stuck in peering and activating?
>>>>>> "ceph df" shows that only 1,5TB are used on the pool, residing on the
>>>>>> hdd's
>>>>>> - which would perfectly fit the crush rule....(?)
>>>>>>
>>>>>
>>>>> Size 2 within the crush rule or size 2 for the two pools?
>>>>>
>>>>> Regards,
>>>>> Burkhard
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to