(accidentally forgot to reply to the list)
> Thank you, setting min_size to 4 allowed I/O again, and the 39 incomplete PGs
> are now:
>
> 39 active+undersized+degraded+remapped+backfilling
>
> Once backfilling is done, I'll increase min_size to 5 again.
>
> Am I likely to encounter this issue whenever I loose an OSD (I/O freezes and
> manually reducing size is required), and is there anything I should be doing
> differently?
>
> Thanks again!
> D
>
> Sent with [ProtonMail](https://protonmail.com) Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, December 12, 2018 3:31 PM, Ashley Merrick
> <singap...@amerrick.co.uk> wrote:
>
>> With EC the min size is set to K + 1.
>>
>> Generally EC is used with a M of 2 or more, reason min size is set to 1 is
>> now you are in a state when a further OSD loss will cause some PG’s to not
>> have at least K size available as you only have 1 extra M.
>>
>> As per the error you can get your pool back online by setting min_size to 4.
>>
>> However this would only be a temp fix while you get the OSD back online /
>> rebuilt so you can go back to your 4 + 1 state.
>>
>> ,Ash
>>
>> On Wed, 12 Dec 2018 at 10:27 AM, David Young <funkypeng...@protonmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a small 2-node cluster with 40 OSDs, using erasure coding 4+1
>>>
>>> I lost osd38, and now I have 39 incomplete PGs.
>>>
>>> ---
>>> PG_AVAILABILITY Reduced data availability: 39 pgs inactive, 39 pgs
>>> incomplete
>>> pg 22.2 is incomplete, acting [19,33,10,8,29] (reducing pool media
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.f is incomplete, acting [17,9,23,14,15] (reducing pool media
>>> from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.12 is incomplete, acting [7,33,10,31,29] (reducing pool media
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.13 is incomplete, acting [23,0,15,33,13] (reducing pool media
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> pg 22.23 is incomplete, acting [29,17,18,15,12] (reducing pool media
>>> min_size from 5 may help; search ceph.com/docs for 'incomplete')
>>> <snip>
>>> ---
>>>
>>> My EC profile is below:
>>>
>>> ---
>>> root@prod1:~# ceph osd erasure-code-profile get ec-41-profile
>>> crush-device-class=
>>> crush-failure-domain=osd
>>> crush-root=default
>>> jerasure-per-chunk-alignment=false
>>> k=4
>>> m=1
>>> plugin=jerasure
>>> technique=reed_sol_van
>>> w=8
>>> ---
>>>
>>> When I query one of the incomplete PGs, I see this:
>>>
>>> ---
>>> "recovery_state": [
>>> {
>>> "name": "Started/Primary/Peering/Incomplete",
>>> "enter_time": "2018-12-11 20:46:11.645796",
>>> "comment": "not enough complete instances of this PG"
>>> },
>>> ---
>>>
>>> And this:
>>>
>>> ---
>>> "probing_osds": [
>>> "0(4)",
>>> "7(2)",
>>> "9(1)",
>>> "11(4)",
>>> "22(3)",
>>> "29(2)",
>>> "36(0)"
>>> ],
>>> "down_osds_we_would_probe": [
>>> 38
>>> ],
>>> "peering_blocked_by": []
>>> },
>>> ---
>>>
>>> I have set this in /etc/ceph/ceph.conf to no effect:
>>> osd_find_best_info_ignore_history_les = true
>>>
>>> As a result of the incomplete PGs, I/O is currently frozen to at last part
>>> of my cephfs.
>>>
>>> I expected to be able to tolerate the loss of an OSD without issue, is
>>> there anything I can do to restore these incomplete PGs?
>>>
>>> When I bring back a new osd38, I see:
>>> ---
>>> "probing_osds": [
>>> "4(2)",
>>> "11(3)",
>>> "22(1)",
>>> "24(1)",
>>> "26(2)",
>>> "36(4)",
>>> "38(1)",
>>> "39(0)"
>>> ],
>>> "down_osds_we_would_probe": [],
>>> "peering_blocked_by": []
>>> },
>>> {
>>> "name": "Started",
>>> "enter_time": "2018-12-11 21:06:35.307379"
>>> }
>>> ---
>>>
>>> But my recovery state is still:
>>>
>>> ---
>>> "recovery_state": [
>>> {
>>> "name": "Started/Primary/Peering/Incomplete",
>>> "enter_time": "2018-12-11 21:06:35.320292",
>>> "comment": "not enough complete instances of this PG"
>>> },
>>> ---
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> D
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com