Without knowing the cluster architecture it's hard to know exactly what may be happening.
How is the cluster hardware? Where are the journals? How busy are the disks (% time busy)? What is the pool size? Are these replicated or EC pools? Have you tried tuning the deep-scrub processes? Have you tried stopping them altogether? Are the journals on SSDs? As a first feeling the cluster may be hitting it's limits (also you have at least one OSD getting full)... On Mon, Nov 14, 2016 at 3:16 PM, Thomas Danan <thomas.da...@mycom-osi.com> wrote: > Hi All, > > > > We have a cluster in production who is suffering from intermittent blocked > request (25 requests are blocked > 32 sec). The blocked request occurrences > are frequent and global to all OSDs. > > From the OSD daemon logs, I can see related messages: > > > > 16-11-11 18:25:29.917518 7fd28b989700 0 log_channel(cluster) log [WRN] : > slow request 30.429723 seconds old, received at 2016-11-11 18:24:59.487570: > osd_op(client.2406272.1:336025615 rbd_data.66e952ae8944a.0000000000350167 > [set-alloc-hint object_size 4194304 write_size 4194304,write 0~524288] > 0.8d3c9da5 snapc 248=[248,216] ondisk+write e201514) currently waiting for > subops from 210,499,821 > > > > . So I guess the issue is related to replication process when writing new > data on the cluster. Again it is never the same secondary OSDs that are > displayed in OSD daemon logs. > > As a result we are experiencing very important IO Write latency on ceph > client side (can be up to 1 hour !!!). > > We have checked Network health as well as disk health but we wre not able > to find any issue. > > > > Wanted to know if this issue was already observed or if you have ideas to > investigate / WA the issue. > > Many thanks... > > > > Thomas > > > > The cluster is composed with 37DN and 851 OSDs and 5 MONs > > The Ceph clients are accessing the client with RBD > > Cluster is Hammer 0.94.5 version > > > > cluster 1a26e029-3734-4b0e-b86e-ca2778d0c990 > > health HEALTH_WARN > > 25 requests are blocked > 32 sec > > 1 near full osd(s) > > noout flag(s) set > > monmap e3: 5 mons at {NVMBD1CGK190D00=10.137.81.13: > 6789/0,nvmbd1cgy050d00=10.137.78.226:6789/0,nvmbd1cgy070d00= > 10.137.78.232:6789/0,nvmbd1cgy090d00=10.137.78.228: > 6789/0,nvmbd1cgy130d00=10.137.78.218:6789/0} > > election epoch 664, quorum 0,1,2,3,4 nvmbd1cgy130d00,nvmbd1cgy050d00, > nvmbd1cgy090d00,nvmbd1cgy070d00,NVMBD1CGK190D00 > > osdmap e205632: 851 osds: 850 up, 850 in > > flags noout > > pgmap v25919096: 10240 pgs, 1 pools, 197 TB data, 50664 kobjects > > 597 TB used, 233 TB / 831 TB avail > > 10208 active+clean > > 32 active+clean+scrubbing+deep > > client io 97822 kB/s rd, 205 MB/s wr, 2402 op/s > > > > > > > > *Thank you* > > *Thomas Danan* > > *Director of Product Development* > > > > Office +33 1 49 03 77 53 > > Mobile +33 7 76 35 76 43 > > Skype thomas.danan > > www.mycom-osi.com > > > > [image: cid:image001.jpg@01CFFC1F.8FF11180] <http://www.mycom-osi.com/> > > Follow us on Twitter, LinkedIn, YouTube and our Blog > > [image: cid:image002.jpg@01CFFD5E.4B6531F0] <http://twitter.com/mycomosi> > [image: cid:image003.jpg@01CFFD5E.4B6531F0] > <http://www.linkedin.com/company/mycom-osi> [image: > cid:image004.jpg@01CFFD5E.4B6531F0] > <http://www.youtube.com/user/MYCOM-OSI> [image: > cid:image005.jpg@01CFFD5E.4B6531F0] <http://www.mycom-osi.com/blog> > > > > ------------------------------ > > This electronic message contains information from Mycom which may be > privileged or confidential. The information is intended to be for the use > of the individual(s) or entity named above. If you are not the intended > recipient, be aware that any disclosure, copying, distribution or any other > use of the contents of this information is prohibited. If you have received > this electronic message in error, please notify us by post or telephone (to > the numbers or correspondence address above) or by email (at the email > address above) immediately. > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com