Without knowing the cluster architecture it's hard to know exactly what may
be happening.

How is the cluster hardware? Where are the journals? How busy are the disks
(% time busy)? What is the pool size? Are these replicated or EC pools?

Have you tried tuning the deep-scrub processes? Have you tried stopping
them altogether? Are the journals on SSDs? As a first feeling the cluster
may be hitting it's limits (also you have at least one OSD getting full)...

On Mon, Nov 14, 2016 at 3:16 PM, Thomas Danan <thomas.da...@mycom-osi.com>
wrote:

> Hi All,
>
>
>
> We have a cluster in production who is suffering from intermittent blocked
> request (25 requests are blocked > 32 sec). The blocked request occurrences
> are frequent and global to all OSDs.
>
> From the OSD daemon logs, I can see related messages:
>
>
>
> 16-11-11 18:25:29.917518 7fd28b989700 0 log_channel(cluster) log [WRN] :
> slow request 30.429723 seconds old, received at 2016-11-11 18:24:59.487570:
> osd_op(client.2406272.1:336025615 rbd_data.66e952ae8944a.0000000000350167
> [set-alloc-hint object_size 4194304 write_size 4194304,write 0~524288]
> 0.8d3c9da5 snapc 248=[248,216] ondisk+write e201514) currently waiting for
> subops from 210,499,821
>
>
>
> . So I guess the issue is related to replication process when writing new
> data on the cluster. Again it is never the same secondary OSDs that are
> displayed in OSD daemon logs.
>
> As a result we are experiencing very important IO Write latency on ceph
> client side (can be up to 1 hour !!!).
>
> We have checked Network health as well as disk health but we wre not able
> to find any issue.
>
>
>
> Wanted to know if this issue was already observed or if you have ideas to
> investigate / WA the issue.
>
> Many thanks...
>
>
>
> Thomas
>
>
>
> The cluster is composed with 37DN and 851 OSDs and 5 MONs
>
> The Ceph clients are accessing the client with RBD
>
> Cluster is Hammer 0.94.5 version
>
>
>
> cluster 1a26e029-3734-4b0e-b86e-ca2778d0c990
>
> health HEALTH_WARN
>
> 25 requests are blocked > 32 sec
>
> 1 near full osd(s)
>
> noout flag(s) set
>
> monmap e3: 5 mons at {NVMBD1CGK190D00=10.137.81.13:
> 6789/0,nvmbd1cgy050d00=10.137.78.226:6789/0,nvmbd1cgy070d00=
> 10.137.78.232:6789/0,nvmbd1cgy090d00=10.137.78.228:
> 6789/0,nvmbd1cgy130d00=10.137.78.218:6789/0}
>
> election epoch 664, quorum 0,1,2,3,4 nvmbd1cgy130d00,nvmbd1cgy050d00,
> nvmbd1cgy090d00,nvmbd1cgy070d00,NVMBD1CGK190D00
>
> osdmap e205632: 851 osds: 850 up, 850 in
>
> flags noout
>
> pgmap v25919096: 10240 pgs, 1 pools, 197 TB data, 50664 kobjects
>
> 597 TB used, 233 TB / 831 TB avail
>
> 10208 active+clean
>
> 32 active+clean+scrubbing+deep
>
> client io 97822 kB/s rd, 205 MB/s wr, 2402 op/s
>
>
>
>
>
>
>
> *Thank you*
>
> *Thomas Danan*
>
> *Director of Product Development*
>
>
>
> Office        +33 1 49 03 77 53
>
> Mobile        +33 7 76 35 76 43
>
> Skype         thomas.danan
>
>  www.mycom-osi.com
>
>
>
> [image: cid:image001.jpg@01CFFC1F.8FF11180] <http://www.mycom-osi.com/>
>
> Follow us on Twitter, LinkedIn, YouTube and our Blog
>
> [image: cid:image002.jpg@01CFFD5E.4B6531F0] <http://twitter.com/mycomosi>
> [image: cid:image003.jpg@01CFFD5E.4B6531F0]
> <http://www.linkedin.com/company/mycom-osi>  [image:
> cid:image004.jpg@01CFFD5E.4B6531F0]
> <http://www.youtube.com/user/MYCOM-OSI>  [image:
> cid:image005.jpg@01CFFD5E.4B6531F0] <http://www.mycom-osi.com/blog>
>
>
>
> ------------------------------
>
> This electronic message contains information from Mycom which may be
> privileged or confidential. The information is intended to be for the use
> of the individual(s) or entity named above. If you are not the intended
> recipient, be aware that any disclosure, copying, distribution or any other
> use of the contents of this information is prohibited. If you have received
> this electronic message in error, please notify us by post or telephone (to
> the numbers or correspondence address above) or by email (at the email
> address above) immediately.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to