When you delete a snapshot, Ceph places the removed snapshot into a list in
the OSD map and places the objects in the snapshot into a snap_trim_q.
Once those 2 things are done, the RBD command returns and you are moving
onto the next snapshot.  The snap_trim_q is an n^2 operation (like all
deletes in Ceph), which means that if the queue has 100 objects on it and
takes 5 minutes to complete, then having 200 objects in the queue will take
25 minutes. (exaggerated time frames to show math)  This same behavior can
be seen when deleting an RBD that has 100,000 objects vs 200,000 objects,
it takes twice as long (note that object map mitigates this greatly by
ignoring any object that hasn't been created, so the previous test would be
easiest to duplicate by disabling the object map on the test RBDs).

So paying attention to snapshot sizes as you clean them up is more
important than how many snapshots you clean up.  Being on Jewel, you don't
really want to use osd_snap_trim_sleep as it literally puts a sleep onto
the main op threads for the OSD.  In Hammer this setting was much more
useful (if not super hacky) and in Luminous the entire process was revamped
and (hopefully) fixed.  Jewel is pretty much not viable for large
quantities of snapshots, but there are ways to get through them.

The following thread on the ML is one of the most informative on this
problem in Jewel.  The second link is the resuming of the thread months
later after the fix was scheduled for backporting into 10.2.8.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-January/015675.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-April/017697.html

On Fri, Jun 30, 2017 at 4:02 PM Kenneth Van Alstyne <
kvanalst...@knightpoint.com> wrote:

> Hey folks:
>         I was wondering if the community can provide any advice — over
> time and due to some external issues, we have managed to accumulate
> thousands of snapshots of RBD images, which are now in need of cleaning
> up.  I have recently attempted to roll through a “for" loop to perform a
> “rbd snap rm” on each snapshot, sequentially, waiting until the rbd command
> finishes before moving onto the next one, of course.  I noticed that
> shortly after starting this, I started seeing thousands of slow ops and a
> few of our guest VMs became unresponsive, naturally.
>
> My questions are:
>         - Is this expected behavior?
>         - Is the background cleanup asynchronous from the “rbd snap rm”
> command?
>                 - If so, are there any OSD parameters I can set to reduce
> the impact on production?
>         - Would “rbd snap purge” be any different?  I expect not, since
> fundamentally, rbd is performing the same action that I do via the loop.
>
> Relevant details are as follows, though I’m not sure cluster size *really*
> has any effect here:
>         - Ceph: version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>         - 5 storage nodes, each with:
>                 - 10x 2TB 7200 RPM SATA Spindles (for a total of 50 OSDs)
>                 - 2x Samsung MZ7LM240 SSDs (used as journal for the OSDs)
>                 - 64GB RAM
>                 - 2x Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz
>                 - 20GBit LACP Port Channel via Intel X520 Dual Port 10GbE
> NIC
>
> Let me know if I’ve missed something fundamental.
>
> Thanks,
>
> --
> Kenneth Van Alstyne
> Systems Architect
> Knight Point Systems, LLC
> Service-Disabled Veteran-Owned Business
> 1775 Wiehle Avenue Suite 101 | Reston, VA 20190
> c: 228-547-8045 <(228)%20547-8045> f: 571-266-3106 <(571)%20266-3106>
> www.knightpoint.com
> DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
> GSA Schedule 70 SDVOSB: GS-35F-0646S
> GSA MOBIS Schedule: GS-10F-0404Y
> ISO 20000 / ISO 27001 / CMMI Level 3
>
> Notice: This e-mail message, including any attachments, is for the sole
> use of the intended recipient(s) and may contain confidential and
> privileged information. Any unauthorized review, copy, use, disclosure, or
> distribution is STRICTLY prohibited. If you are not the intended recipient,
> please contact the sender by reply e-mail and destroy all copies of the
> original message.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to