Thanks Dan, that looks like a really neat method & script for a few use-cases. We've actually used several of the scripts in that repo over the years, so, many thanks for sharing.
That method will definitely help in the scenario in which a set of unnecessary pg remaps have been triggered and can be caught early and reverted. I'm still a little concerned about the possibility of, for example, a brief network glitch occurring at night and then waking up to a full unbalanced cluster. Especially with NVMe clusters that can rapidly remap and rebalance (and for which we also have a greater impetus to squeeze out as much available capacity as possible with upmap due to cost per TB). It's just a risk I hadn't previously considered and was wondering if others have either run into it or felt any need to plan around it. Cheers, Dylan >From: Dan van der Ster <d...@vanderster.com> >Sent: Friday, 1 May 2020 5:53 PM >To: Dylan McCulloch <d...@unimelb.edu.au> >Cc: ceph-users <ceph-users@ceph.io> > >Subject: Re: [ceph-users] upmap balancer and consequences of osds briefly >marked out > >Hi, > >You're correct that all the relevant upmap entries are removed when an >OSD is marked out. >You can try to use this script which will recreate them and get the >cluster back to HEALTH_OK quickly: >https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py > >Cheers, Dan > > >On Fri, May 1, 2020 at 9:36 AM Dylan McCulloch <d...@unimelb.edu.au> wrote: >> >> Hi all, >> >> We're using upmap balancer which has made a huge improvement in evenly >> distributing data on our osds and has provided a substantial increase in >> usable capacity. >> >> Currently on ceph version: 12.2.13 luminous >> >> We ran into a firewall issue recently which led to a large number of osds >> being briefly marked 'down' & 'out'. The osds came back 'up' & 'in' after >> about 25 mins and the cluster was fine but had to perform a significant >> amount of backfilling/recovery despite > there being no end-user client I/O during that period. >> >> Presumably the large number of remapped pgs and backfills were due to >> pg_upmap_items being removed from the osdmap when osds were marked out and >> subsequently those pgs were redistributed using the default crush algorithm. >> As a result of the brief outage our cluster became significantly imbalanced >> again with several osds very close to full. >> Is there any reasonable mitigation for that scenario? >> >> The auto-balancer will not perform optimizations while there are degraded >> pgs, so it would only start reapplying pg upmap exceptions after initial >> recovery is complete (at which point capacity may be dangerously reduced). >> Similarly, as admins, we normally only apply changes when the cluster is in >> a healthy state, but if the same issue were to occur again would it be >> advisable to manually apply balancer plans while initial recovery is still >> taking place? >> >> I guess my concern from this experience is that making use of the capacity >> gained by using upmap balancer appears to carry some risk. i.e. it's >> possible for a brief outage to remove those space efficiencies relatively >> quickly and potentially result in full > osds/cluster before the automatic balancer is able to resume and redistribute > pgs using upmap. >> >> Curious whether others have any thoughts or experience regarding this. >> >> Cheers, >> Dylan >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io