[ceph-users] Re: upmap balancer and consequences of osds briefly marked out

Dan van der Ster Mon, 04 May 2020 02:42:10 -0700

Right, it would freeze the PGs in place at the time upmap-remapped is run.
You need to keep running the upmap balancer afterwards to restore the
optimized state.


I don't quite understand your question about a failed / replaced osd,
but yes it is relevant here.

Suppose you have osds 0, 1, 2, and 3 and the osd.1 fails:

A hypothetical pg_upmap_items which mapped 0 to 1 *and* 2 to 3 would
be removed when osd.1 is marked out. This would result in a PG being
remapped and data moved from 3 to 2. [1]
So if you run upmap-remapped just afterwards, it would create a new
pg_upmap_items mapping 2 to 3 and making that PG active+clean again
immediately.
And then later when you recreate osd.1, crush would recalculate and
then after some iterations of the upmap balancer the original
pg_upmap_items would be created.

-- Dan

[1] this hints at an optimization for the "clean upmaps"
functionalities in OSDMap.cc -- if an osd is marked out it could
modify the relevant pg_upmap_items' accordingly, rather than remove
them completely.


On Sun, May 3, 2020 at 10:27 PM Anthony D'Atri <a...@dreamsnake.net> wrote:
>
> Do I misunderstand this script, or does it not _quite_ do what’s desired here?
>
> I fully get the scenario of applying a full-cluster map to allow incremental 
> topology changes.
>
> To be clear, if this is run to effectively freeze backfill during / following 
> a traumatic event, it will freeze that adapted state, not strictly return to 
> the pre-event state?  And thus the pg-upmap balancer would still need to be 
> run to revert to the prior state?  And this would also hold true for a 
> failed/replaced OSD?
>
>
> > On May 1, 2020, at 7:37 AM, Dylan McCulloch <d...@unimelb.edu.au> wrote:
> >
> > Thanks Dan, that looks like a really neat method & script for a few 
> > use-cases. We've actually used several of the scripts in that repo over the 
> > years, so, many thanks for sharing.
> >
> > That method will definitely help in the scenario in which a set of 
> > unnecessary pg remaps have been triggered and can be caught early and 
> > reverted. I'm still a little concerned about the possibility of, for 
> > example, a brief network glitch occurring at night and then waking up to a 
> > full unbalanced cluster. Especially with NVMe clusters that can rapidly 
> > remap and rebalance (and for which we also have a greater impetus to 
> > squeeze out as much available capacity as possible with upmap due to cost 
> > per TB). It's just a risk I hadn't previously considered and was wondering 
> > if others have either run into it or felt any need to plan around it.
> >
> > Cheers,
> > Dylan
> >
> >
> >> From: Dan van der Ster <d...@vanderster.com>
> >> Sent: Friday, 1 May 2020 5:53 PM
> >> To: Dylan McCulloch <d...@unimelb.edu.au>
> >> Cc: ceph-users <ceph-users@ceph.io>
> >>
> >> Subject: Re: [ceph-users] upmap balancer and consequences of osds briefly 
> >> marked out
> >>
> >> Hi,
> >>
> >> You're correct that all the relevant upmap entries are removed when an
> >> OSD is marked out.
> >> You can try to use this script which will recreate them and get the
> >> cluster back to HEALTH_OK quickly:
> >> https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
> >>
> >> Cheers, Dan
> >>
> >>
> >> On Fri, May 1, 2020 at 9:36 AM Dylan McCulloch <d...@unimelb.edu.au> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We're using upmap balancer which has made a huge improvement in evenly 
> >>> distributing data on our osds and has provided a substantial increase in 
> >>> usable capacity.
> >>>
> >>> Currently on ceph version: 12.2.13 luminous
> >>>
> >>> We ran into a firewall issue recently which led to a large number of osds 
> >>> being briefly marked 'down' & 'out'. The osds came back 'up' & 'in' after 
> >>> about 25 mins and the cluster was fine but had to perform a significant 
> >>> amount of backfilling/recovery despite
> >> there being no end-user client I/O during that period.
> >>>
> >>> Presumably the large number of remapped pgs and backfills were due to 
> >>> pg_upmap_items being removed from the osdmap when osds were marked out 
> >>> and subsequently those pgs were redistributed using the default crush 
> >>> algorithm.
> >>> As a result of the brief outage our cluster became significantly 
> >>> imbalanced again with several osds very close to full.
> >>> Is there any reasonable mitigation for that scenario?
> >>>
> >>> The auto-balancer will not perform optimizations while there are degraded 
> >>> pgs, so it would only start reapplying pg upmap exceptions after initial 
> >>> recovery is complete (at which point capacity may be dangerously reduced).
> >>> Similarly, as admins, we normally only apply changes when the cluster is 
> >>> in a healthy state, but if the same issue were to occur again would it be 
> >>> advisable to manually apply balancer plans while initial recovery is 
> >>> still taking place?
> >>>
> >>> I guess my concern from this experience is that making use of the 
> >>> capacity gained by using upmap balancer appears to carry some risk. i.e. 
> >>> it's possible for a brief outage to remove those space efficiencies 
> >>> relatively quickly and potentially result in full
> >> osds/cluster before the automatic balancer is able to resume and 
> >> redistribute pgs using upmap.
> >>>
> >>> Curious whether others have any thoughts or experience regarding this.
> >>>
> >>> Cheers,
> >>> Dylan
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: upmap balancer and consequences of osds briefly marked out

Reply via email to