Hi Greg, That is really great, thanks for your response, I completely understand what is going on now. I wasn't thinking about capacity in a per PG sense.
I have exported a pg dump of the cache pool and calculated some percentages and I can see that the data can vary up to around 5% amongst the PG's, so this probably ties up with there being isolated bursts on single OSD's. I've knocked the cache_target_full_ratio down by 10% and will see if that helps. FYI Regarding my 2nd point about having high and low ratios for the cache eviction/flushing. I have been speaking to Li Wang and he is potentially interested in developing a prototype. Thanks Again, Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Gregory Farnum > Sent: 27 May 2015 22:02 > To: Nick Fisk > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Cache Pool Flush/Eviction Limits - Hard of Soft? > > The max target limit is a hard limit: the OSDs won't let more than that amount > of data in the cache tier. They will start flushing and evicting based on the > percentage ratios you can set (I don't remember the exact parameter > names) and you may need to set these more aggressively for your given > workload. > > The tricky bit with this is that of course the OSDs don't have global > knowledge > about how much total data is in the cache — so when you set a 100TB cache > that has 1024 PGs, the OSDs are actually applying those limits on a per-PG > basis, and not letting any given PG use more than > 100/1024 TB. This is probably the heavy read activity you're seeing on one > OSD at a time, when it happens to reach the hard limit. :/ > > The specific blocked ops you're seeing are in various stages and probably just > indicative of the OSD doing a bunch of flushing which is blocking other > accesses. > -Greg > > On Tue, May 19, 2015 at 12:03 PM, Nick Fisk <n...@fisk.me.uk> wrote: > > Been doing some more digging. I'm getting messages in the OSD logs > > like these, don't know if these are normal or a clue to something not > > right > > > > 2015-05-19 18:36:27.664698 7f58b91dd700 0 log_channel(cluster) log [WRN] > : > > slow request 30.346117 seconds old, received at 2015-05-19 > 18:35:57.318208: > > osd_repop(client.1205463.0:7612211 6.2f > > ec3d412f/rb.0.6e7a9.74b0dc51.0000000be050/head//6 v 2674'1102892) > > currently commit_sent > > > > 2015-05-19 17:50:29.700766 7ff1503db700 0 log_channel(cluster) log [WRN] > : > > slow request 32.548750 seconds old, received at 2015-05-19 > 17:49:57.151935: > > osd_repop_reply(osd.46.0:2088048 6.64 ondisk, result = 0) currently no > > flag points reached > > > > 2015-05-19 17:47:26.903122 7f296b6fc700 0 log_channel(cluster) log [WRN] > : > > slow request 30.620519 seconds old, received at 2015-05-19 > 17:46:56.282504: > > osd_op(client.1205463.0:7261972 rb.0.6e7a9.74b0dc51.0000000b7ff9 > > [set-alloc-hint object_size 1048576 write_size 1048576,write > > 258048~131072] 6.882797bc ack+ondisk+write+known_if_redirected > e2674) > > currently commit_sent > > > > > > > > > > > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf > >> Of Nick Fisk > >> Sent: 18 May 2015 17:25 > >> To: ceph-users@lists.ceph.com > >> Subject: Re: [ceph-users] Cache Pool Flush/Eviction Limits - Hard of Soft? > >> > >> Just to update on this, I've been watching iostat across my Ceph > >> nodes and > > I > >> can see something slightly puzzling happening and is most likely the > >> cause > > of > >> the slow (>32s) requests I am getting. > >> > >> During a client write-only IO stream, I see reads and writes to the > >> cache > > tier, > >> which is normal as blocks are being promoted/demoted. The latency > >> does suffer, but not excessively and is acceptable for data that has > >> fallen out > > of > >> cache. > >> > >> However, every now and again it appears that one of the OSD's > >> suddenly > > just > >> starts aggressively reading and appears to block any IO until that > >> read > > has > >> finished. Example below where /dev/sdd is a 10K disk in the cache tier. > > All > >> other nodes have their /dev/sdd devices being completely idle during > >> this period. The disks on the base tier seem to be doing writes > >> during this > > period, > >> so looks related to some sort of flushing. > >> > >> Device rrqm/s wrqm/s r/s w/s rkB/s wkB/s rq-sz > > qu-sz > >> await r_wait w_wait svctm util > >> sdd 0.00 0.00 471.50 0.00 2680.00 0.00 11.37 0.96 2.03 > >> 2.03 0.00 1.90 89.80 > >> > >> Most of the times I observed this whilst I was watching iostat, the > >> read > > only > >> lasted around 5-10s, but I suspect that sometimes it is going on for > > longer and > >> is the cause of the "requests are blocked errors". I have also > >> noticed > > that this > >> appears to happen more often depending on if there are a greater > >> number of blocks to be promoted/demoted. Other pools are not affected > >> during these hangs. > >> > >> From the look of the iostat stats, I would assume that for a 10k > >> disk, it > > must > >> be doing a sequential read to get that number of IO's. > >> > >> Does anybody have any clue what might be going on? > >> > >> Nick > >> > >> > -----Original Message----- > >> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On > >> > Behalf Of Nick Fisk > >> > Sent: 30 April 2015 12:53 > >> > To: ceph-users@lists.ceph.com > >> > Subject: [ceph-users] Cache Pool Flush/Eviction Limits - Hard of Soft? > >> > > >> > Does anyone know if the Flush and Eviction limits are hard limits, > >> > ie as > >> soon as > >> > they are exceeded writes will block, or will the pool only block > >> > when it reaches Target_max_bytes? > >> > > >> > I'm see really poor performance and frequent requests are blocked > >> > messages once data starts having to be evicted/flushed and I was > >> > just wondering if the above was true. > >> > > >> > If the limits are soft, I would imagine making high and low target > >> > limits > >> would > >> > help:- > >> > > >> > Target_dirty_bytes_low=.3 > >> > Target_dirty_bytes_high=.4 > >> > > >> > Once the amount of dirty bytes passes the low limit a very low > >> > priority > >> flush > >> > occurs, if the high limit is reached data is flushed much more > >> aggressively. > >> > The same could also exist for eviction. This will allow burst of > >> > write > >> activity to > >> > occur before flushing starts heavily impacting performance. > >> > > >> > Nick > >> > > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com