Thanks, I tried that earlier but so far I am still getting slow requests.
Although I also found I didn't have writeback enabled on my hardware
controller. It seems that after changing that and setting the max bytes
things are a bit more stable, less slow requests popping up. The fact that
the writeback seemed to have helped makes me think somehow it is the disks
after all causing issues. I will see if I can diagnose this a bit further.
Plotting some commit_latency_ms graphs and filtering the slowest OSDs does
seem to consistently return the same disks. I might try to remove the OSDs
from the cluster to see what happens then.

Thanks for all your help so far!

On Wed, May 11, 2016 at 9:07 AM, Nick Fisk <n...@fisk.me.uk> wrote:

> Hi Peter, yes just restart the OSD for the setting to take effect.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Peter Kerdisle
> *Sent:* 10 May 2016 19:06
> *To:* Nick Fisk <n...@fisk.me.uk>
>
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] Erasure pool performance expectations
>
>
>
> Thanks Nick. I added it to my ceph.conf. I'm guessing this is an OSD
> setting and therefor I should restart my OSDs is that correct?
>
>
>
> On Tue, May 10, 2016 at 3:48 PM, Nick Fisk <n...@fisk.me.uk> wrote:
>
>
>
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> > Peter Kerdisle
> > Sent: 10 May 2016 14:37
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> >
> > To answer my own question it seems that you can change settings on the
> fly
> > using
> >
> > ceph tell osd.* injectargs '--osd_tier_promote_max_bytes_sec 5242880'
> > osd.0: osd_tier_promote_max_bytes_sec = '5242880' (unchangeable)
> >
> > However the response seems to imply I can't change this setting. Is
> there an
> > other way to change these settings?
>
> Sorry Peter, I missed your last email. You can also specify that setting
> in the ceph.conf, ie I have in mine
>
> osd_tier_promote_max_bytes_sec = 4000000
>
>
>
>
> >
> >
> > On Sun, May 8, 2016 at 2:37 PM, Peter Kerdisle <peter.kerdi...@gmail.com
> >
> > wrote:
> > Hey guys,
> >
> > I noticed the merge request that fixes the switch around here
> > https://github.com/ceph/ceph/pull/8912
> <http://xo4t.mj.am/link/xo4t/x1n89xr5zzt9/1/kF1rhopFeLZlWDUXlg5nUg/aHR0cHM6Ly9naXRodWIuY29tL2NlcGgvY2VwaC9wdWxsLzg5MTI>
> >
> > I had two questions:
> >
> > • Does this effect my performance in any way? Could it explain the slow
> > requests I keep having?
> > • Can I modify these settings manually myself on my cluster?
> > Thanks,
> >
> > Peter
> >
> >
> > On Fri, May 6, 2016 at 9:58 AM, Peter Kerdisle <peter.kerdi...@gmail.com
> >
> > wrote:
> > Hey Mark,
> >
> > Sorry I missed your message as I'm only subscribed to daily digests.
> >
> > Date: Tue, 3 May 2016 09:05:02 -0500
> > From: Mark Nelson <mnel...@redhat.com>
> > To: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> > Message-ID: <df3de049-a7f9-7f86-3ed3-47079e401...@redhat.com>
> > Content-Type: text/plain; charset=windows-1252; format=flowed
> > In addition to what nick said, it's really valuable to watch your cache
> > tier write behavior during heavy IO.  One thing I noticed is you said
> > you have 2 SSDs for journals and 7 SSDs for data.
> >
> > I thought the hardware recommendations were 1 journal disk per 3 or 4
> data
> > disks but I think I might have misunderstood it. Looking at my journal
> > read/writes they seem to be ok
> > though: https://www.dropbox.com/s/er7bei4idd56g4d/Screenshot%202016-
> <http://xo4t.mj.am/link/xo4t/x1n89xr5zzt9/2/C0L8_cCPywFU6RAK_X9ZAQ/aHR0cHM6Ly93d3cuZHJvcGJveC5jb20vcy9lcjdiZWk0aWRkNTZnNGQvU2NyZWVuc2hvdCUyMDIwMTYt>
> > 05-06%2009.55.30.png?dl=0
> >
> > However I started running into a lot of slow requests (made a separate
> > thread for those: Diagnosing slow requests) and now I'm hoping these
> could
> > be related to my journaling setup.
> >
> > If they are all of
> > the same type, you're likely bottlenecked by the journal SSDs for
> > writes, which compounded with the heavy promotions is going to really
> > hold you back.
> > What you really want:
> > 1) (assuming filestore) equal large write throughput between the
> > journals and data disks.
> > How would one achieve that?
> >
> > 2) promotions to be limited by some reasonable fraction of the cache
> > tier and/or network throughput (say 70%).  This is why the
> > user-configurable promotion throttles were added in jewel.
> > Are these already in the docs somewhere?
> >
> > 3) The cache tier to fill up quickly when empty but change slowly once
> > it's full (ie limiting promotions and evictions).  No real way to do
> > this yet.
> > Mark
> >
> > Thanks for your thoughts.
> >
> > Peter
> >
> >
>
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to