Thanks Dan, that looks like a really neat method & script for a few use-cases. 
We've actually used several of the scripts in that repo over the years, so, 
many thanks for sharing.

That method will definitely help in the scenario in which a set of unnecessary 
pg remaps have been triggered and can be caught early and reverted. I'm still a 
little concerned about the possibility of, for example, a brief network glitch 
occurring at night and then waking up to a full unbalanced cluster. Especially 
with NVMe clusters that can rapidly remap and rebalance (and for which we also 
have a greater impetus to squeeze out as much available capacity as possible 
with upmap due to cost per TB). It's just a risk I hadn't previously considered 
and was wondering if others have either run into it or felt any need to plan 
around it.

Cheers,
Dylan


>From: Dan van der Ster <d...@vanderster.com>
>Sent: Friday, 1 May 2020 5:53 PM
>To: Dylan McCulloch <d...@unimelb.edu.au>
>Cc: ceph-users <ceph-users@ceph.io>
>
>Subject: Re: [ceph-users] upmap balancer and consequences of osds briefly 
>marked out
>
>Hi,
>
>You're correct that all the relevant upmap entries are removed when an
>OSD is marked out.
>You can try to use this script which will recreate them and get the
>cluster back to HEALTH_OK quickly:
>https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
>
>Cheers, Dan
>
>
>On Fri, May 1, 2020 at 9:36 AM Dylan McCulloch <d...@unimelb.edu.au> wrote:
>>
>> Hi all,
>>
>> We're using upmap balancer which has made a huge improvement in evenly 
>> distributing data on our osds and has provided a substantial increase in 
>> usable capacity.
>>
>> Currently on ceph version: 12.2.13 luminous
>>
>> We ran into a firewall issue recently which led to a large number of osds 
>> being briefly marked 'down' & 'out'. The osds came back 'up' & 'in' after 
>> about 25 mins and the cluster was fine but had to perform a significant 
>> amount of backfilling/recovery despite
> there being no end-user client I/O during that period.
>>
>> Presumably the large number of remapped pgs and backfills were due to 
>> pg_upmap_items being removed from the osdmap when osds were marked out and 
>> subsequently those pgs were redistributed using the default crush algorithm.
>> As a result of the brief outage our cluster became significantly imbalanced 
>> again with several osds very close to full.
>> Is there any reasonable mitigation for that scenario?
>>
>> The auto-balancer will not perform optimizations while there are degraded 
>> pgs, so it would only start reapplying pg upmap exceptions after initial 
>> recovery is complete (at which point capacity may be dangerously reduced).
>> Similarly, as admins, we normally only apply changes when the cluster is in 
>> a healthy state, but if the same issue were to occur again would it be 
>> advisable to manually apply balancer plans while initial recovery is still 
>> taking place?
>>
>> I guess my concern from this experience is that making use of the capacity 
>> gained by using upmap balancer appears to carry some risk. i.e. it's 
>> possible for a brief outage to remove those space efficiencies relatively 
>> quickly and potentially result in full
> osds/cluster before the automatic balancer is able to resume and redistribute 
> pgs using upmap.
>>
>> Curious whether others have any thoughts or experience regarding this.
>>
>> Cheers,
>> Dylan
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to