On Mon, May 16, 2016 at 11:58 AM, Nick Fisk <n...@fisk.me.uk> wrote:

> > -----Original Message-----
> > From: Peter Kerdisle [mailto:peter.kerdi...@gmail.com]
> > Sent: 16 May 2016 10:39
> > To: n...@fisk.me.uk
> > Cc: ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Erasure pool performance expectations
> >
> > I'm forcing a flush by lower the cache_target_dirty_ratio to a lower
> value.
> > This forces writes to the EC pool, these are the operations I'm trying to
> > throttle a bit. I am understanding you correctly that's throttling only
> works for
> > the other way around? Promoting cold objects into the hot cache?
>
> Yes that’s correct. You want to throttle the flushes which is done by
> another setting(s)
>
> Firstly set something like this in your ceph.conf
> osd_agent_max_low_ops = 1
> osd_agent_max_ops = 4
>
I did not know about this, that's great, I will play around with these.


>
> This controls how many parallel threads the tiering agent will use. You
> can bump them up later if needed.
>
> Next set on your cache pools, these two settings. Try and keep them about
> .2 apart. So something like .4 and .6 are good to start with.
> cache_target_dirty_ratio
> cache_target_dirty_high_ratio
>
Here is actually the heart of the matter. Ideally I would love to run it at
0.0 if that makes sense. I want no dirty objects in my hot cache at all,
has anybody ever tried this? Right now I'm just pushing
cache_target_dirty_ratio during low activity moments by setting it to 0.2
and then bringing it back up to 0.6 when it's done or activity starts up
again.


>
> And let me know if that helps.
>
>
>
> >
> > The measurement is a problem for me at the moment. I'm trying to get the
> > perf dumps into collectd/graphite but it seems I need to hand roll a
> solution
> > since the plugins I found are not working anymore. What I'm doing now is
> > just summing the bandwidth statistics from my nodes to get an
> > approximated number. I hope to make some time this week to write a
> > collectd plugin to fetch get the actual stats from perf dumps.
>
> I've used diamond to collect the stats and it worked really well. I can
> share my graphite query to sum the promote/flush rates as well if it helps?
>
I will check out diamond, are you using this specifically?
https://github.com/BrightcoveOS/Diamond/wiki/collectors-CephCollector

It would be great if you could share your graphite queries :)

>
> >
> > I confirmed the settings are indeed correctly picked up across the nodes
> in
> > the cluster.
>
> Good, glad we got that sorted
>
> >
> > I tried switching my pool to readforward since for my needs the EC pool
> is
> > fast enough for reads but I got scared when I got the warning about data
> > corruption. How safe is readforward really at this point? I noticed the
> option
> > was removed from the latest docs while still living on the google cached
> > version: http://webcache.googleusercontent.com/search?q=cache:http://d
> > ocs.ceph.com/docs/master/rados/operations/cache-tiering/
>
> Not too sure about the safety, but I'm in the view that those extra modes
> probably aren’t needed, I'm pretty sure the same effect can be controlled
> via the recency settings (Someone correct me please). The higher the
> recency settings, the less likely an object will be chosen to be promoted
> into the cache tier. If you set the min_recency for reads to be higher than
> the max hit_set count. Then in theory no reads will ever cause an object to
> be promoted.
>
You are right, your earlier help made me do exactly that and things have
been working better since.

Thanks!
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to