Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Samuel Just Thu, 19 Jan 2017 15:23:28 -0800

I could probably put together a wip branch if you have a test cluster you
could try it out on.
-Sam


On Thu, Jan 19, 2017 at 2:27 PM, David Turner <david.tur...@storagecraft.com
> wrote:

> To be clear, we are willing to change to a snap_trim_sleep of 0 and try to
> manage it with the other available settings... but it is sounding like that
> won't really work for us since our main op thread(s) will just be saturated
> with snap trimming almost all day.  We currently only have ~6 hours/day
> where our snap trim q's are empty.
>
> ------------------------------
>
> <https://storagecraft.com> David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> ------------------------------
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> ------------------------------
>
> ------------------------------
> *From:* ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of David
> Turner [david.tur...@storagecraft.com]
> *Sent:* Thursday, January 19, 2017 3:25 PM
> *To:* Samuel Just; Nick Fisk
>
> *Cc:* ceph-users
> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
> We are a couple of weeks away from upgrading to Jewel in our production
> clusters (after months of testing in our QA environments), but this might
> prevent us from making the migration from Hammer.   We delete ~8,000
> snapshots/day between 3 clusters and our snap_trim_q gets up to about 60
> Million in each of those clusters.  We have to use an osd_snap_trim_sleep
> of 0.25 to prevent our clusters from falling on their faces during our big
> load and 0.1 the rest of the day to catch up on the snap trim q.
>
> Is our setup possible to use on Jewel?
>
> ------------------------------
>
> <https://storagecraft.com> David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> ------------------------------
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> ------------------------------
>
> ________________________________________
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Samuel
> Just [sj...@redhat.com]
> Sent: Thursday, January 19, 2017 2:45 PM
> To: Nick Fisk
> Cc: ceph-users
> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?
>
> Yeah, I think you're probably right.  The answer is probably to add an
> explicit rate-limiting element to the way the snaptrim events are
> scheduled.
> -Sam
>
> On Thu, Jan 19, 2017 at 1:34 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> > I will give those both a go and report back, but the more I thinking
> about this the less I'm convinced that it's going to help.
> >
> > I think the problem is a general IO imbalance, there is probably
> something like 100+ times more trimming IO than client IO and so even if
> client IO gets promoted to the front of the queue by Ceph, once it hits the
> Linux IO layer its fighting for itself. I guess this approach works with
> scrubbing as each read IO has to wait to be read before the next one is
> submitted, so the queue can be managed on the OSD. With trimming, writes
> can buffer up below what the OSD controls.
> >
> > I don't know if the snap trimming goes nuts because the journals are
> acking each request and the spinning disks can't keep up, or if it's
> something else. Does WBThrottle get involved with snap trimming?
> >
> > But from an underlying disk perspective, there is definitely more than 2
> snaps per OSD at a time going on, even if the OSD itself is not processing
> more than 2 at a time. I think there either needs to be another knob so
> that Ceph can throttle back snaps, not just de-prioritise them. Or, there
> needs a whole new kernel interface where an application can priority tag
> individual IO's for CFQ to handle, instead of the current limitation of
> priority per thread, I realise this is probably very very hard or
> impossible. But it would allow Ceph to control IO queue's right down to the
> disk.
> >
> >> -----Original Message-----
> >> From: Samuel Just [mailto:sj...@redhat.com]
> >> Sent: 19 January 2017 18:58
> >> To: Nick Fisk <n...@fisk.me.uk>
> >> Cc: Dan van der Ster <d...@vanderster.com>; ceph-users <
> ceph-users@lists.ceph.com>
> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
> >>
> >> Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the
> default value, equal to a 16MB IO) and
> >> osd_pg_max_concurrent_snap_trims to 1 (from 2)?
> >> -Sam
> >>
> >> On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <n...@fisk.me.uk> wrote:
> >> > Hi Sam,
> >> >
> >> > Thanks for the confirmation on both which thread the trimming happens
> in and for confirming my suspicion that sleeping is now a
> >> bad idea.
> >> >
> >> > The problem I see is that even with setting the priority for trimming
> down low, it still seems to completely swamp the cluster. The
> >> trims seem to get submitted in an async nature which seems to leave all
> my disks sitting at queue depths of 50+ for several minutes
> >> until the snapshot is removed, often also causing several OSD's to get
> marked out and start flapping. I'm using WPQ but haven't
> >> changed the cutoff variable yet as I know you are working on fixing a
> bug with that.
> >> >
> >> > Nick
> >> >
> >> >> -----Original Message-----
> >> >> From: Samuel Just [mailto:sj...@redhat.com]
> >> >> Sent: 19 January 2017 15:47
> >> >> To: Dan van der Ster <d...@vanderster.com>
> >> >> Cc: Nick Fisk <n...@fisk.me.uk>; ceph-users
> >> >> <ceph-users@lists.ceph.com>
> >> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
> >> >>
> >> >> Snaptrimming is now in the main op threadpool along with scrub,
> >> >> recovery, and client IO.  I don't think it's a good idea to use any
> of the _sleep configs anymore -- the intention is that by setting the
> >> priority low, they won't actually be scheduled much.
> >> >> -Sam
> >> >>
> >> >> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster <
> d...@vanderster.com> wrote:
> >> >> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <n...@fisk.me.uk>
> wrote:
> >> >> >> Hi Dan,
> >> >> >>
> >> >> >> I carried out some more testing after doubling the op threads, it
> >> >> >> may have had a small benefit as potentially some threads are
> >> >> >> available, but latency still sits more or less around the
> >> >> >> configured snap sleep time. Even more threads might help, but I
> >> >> >> suspect you are just
> >> >> lowering the chance of IO's that are stuck behind the sleep, rather
> than actually solving the problem.
> >> >> >>
> >> >> >> I'm guessing when the snap trimming was in disk thread, you
> >> >> >> wouldn't have noticed these sleeps, but now it's in the op thread
> >> >> >> it will just sit there holding up all IO and be a lot more
> >> >> >> noticable. It might be
> >> >> that this option shouldn't be used with Jewel+?
> >> >> >
> >> >> > That's a good thought -- so we need confirmation which thread is
> >> >> > doing the snap trimming. I honestly can't figure it out from the
> >> >> > code -- hopefully a dev could explain how it works.
> >> >> >
> >> >> > Otherwise, I don't have much practical experience with snap
> >> >> > trimming in jewel yet -- our RBD cluster is still running 0.94.9.
> >> >> >
> >> >> > Cheers, Dan
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>> -----Original Message-----
> >> >> >>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> >> >> >>> Behalf Of Nick Fisk
> >> >> >>> Sent: 13 January 2017 20:38
> >> >> >>> To: 'Dan van der Ster' <d...@vanderster.com>
> >> >> >>> Cc: 'ceph-users' <ceph-users@lists.ceph.com>
> >> >> >>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG
> during sleep?
> >> >> >>>
> >> >> >>> We're on Jewel and your right, I'm pretty sure the snap stuff is
> also now handled in the op thread.
> >> >> >>>
> >> >> >>> The dump historic ops socket command showed a 10s delay at the
> >> >> >>> "Reached PG" stage, from Greg's response [1], it would suggest
> >> >> >>> that the OSD itself isn't blocking but the PG it's currently
> >> >> >>> sleeping whilst trimming. I think in the former case, it would
> >> >> >>> have a
> >> >> >> high time
> >> >> >>> on the "Started" part of the op? Anyway I will carry out some
> >> >> >>> more testing with higher osd op threads and see if that makes
> any difference. Thanks for the suggestion.
> >> >> >>>
> >> >> >>> Nick
> >> >> >>>
> >> >> >>>
> >> >> >>> [1]
> >> >> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2016-March/00
> >> >> >>> 865
> >> >> >>> 2.html
> >> >> >>>
> >> >> >>> > -----Original Message-----
> >> >> >>> > From: Dan van der Ster [mailto:d...@vanderster.com]
> >> >> >>> > Sent: 13 January 2017 10:28
> >> >> >>> > To: Nick Fisk <n...@fisk.me.uk>
> >> >> >>> > Cc: ceph-users <ceph-users@lists.ceph.com>
> >> >> >>> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG
> during sleep?
> >> >> >>> >
> >> >> >>> > Hammer or jewel? I've forgotten which thread pool is handling
> >> >> >>> > the snap trim nowadays -- is it the op thread yet? If so,
> >> >> >>> > perhaps all the op threads are stuck sleeping? Just a wild
> >> >> >>> > guess. (Maybe
> >> >> >> increasing #
> >> >> >>> op threads would help?).
> >> >> >>> >
> >> >> >>> > -- Dan
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <n...@fisk.me.uk>
> wrote:
> >> >> >>> > > Hi,
> >> >> >>> > >
> >> >> >>> > > I had been testing some higher values with the
> >> >> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of
> >> >> >>> > > removing RBD snapshots on our cluster and I have come across
> >> >> >>> > > what I believe to be a possible unintended consequence. The
> >> >> >>> > > value of the sleep seems to keep the
> >> >> >>> > lock on the PG open so that no other IO can use the PG whilst
> the snap removal operation is sleeping.
> >> >> >>> > >
> >> >> >>> > > I had set the variable to 10s to completely minimise the
> >> >> >>> > > impact as I had some multi TB snapshots to remove and noticed
> >> >> >>> > > that suddenly all IO to the cluster had a latency of roughly
> >> >> >>> > > 10s as well, all the
> >> >> >>> > dumped ops show waiting on PG for 10s as well.
> >> >> >>> > >
> >> >> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be
> >> >> >>> > > used up to say a max of 0.1s and this is a known side effect,
> >> >> >>> > > or should the lock on the PG be removed so that normal IO can
> >> >> >>> > > continue during the
> >> >> >>> > sleeps?
> >> >> >>> > >
> >> >> >>> > > Nick
> >> >> >>> > >
> >> >> >>> > > _______________________________________________
> >> >> >>> > > ceph-users mailing list
> >> >> >>> > > ceph-users@lists.ceph.com
> >> >> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>>
> >> >> >>> _______________________________________________
> >> >> >>> ceph-users mailing list
> >> >> >>> ceph-users@lists.ceph.com
> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >>
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users@lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Reply via email to