When executing ceph -w I see the following warning: 2013-04-09 22:38:07.288948 osd.2 [WRN] slow request 30.180683 seconds old, received at 2013-04-09 22:37:37.108178: osd_op(client.4107.1:9678 10000000002.000001df [write 0~4194304 [6@0]] 0.4e208174 snapc 1=[]) currently waiting for subops from [0]
So what could be causing this? On Tue, Apr 9, 2013 at 12:54 PM, Ziemowit Pierzycki <ziemo...@pierzycki.com>wrote: > Neither made a difference. I also have a glusterFS cluster with two nodes > in replicating mode residing on 1TB drives: > > [root@triton speed]# dd conv=fdatasync if=/dev/zero > of=/mnt/speed/test.out bs=512k count=10000 > 10000+0 records in > 10000+0 records out > 5242880000 bytes (5.2 GB) copied, 43.573 s, 120 MB/s > > ... and Ceph: > > [root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out > bs=512k count=10000 > 10000+0 records in > 10000+0 records out > 5242880000 bytes (5.2 GB) copied, 366.911 s, 14.3 MB/s > > > On Mon, Apr 8, 2013 at 4:29 PM, Mark Nelson <mark.nel...@inktank.com>wrote: > >> On 04/08/2013 04:12 PM, Ziemowit Pierzycki wrote: >> >>> There is one SSD in each node. IPoIB performance is about 7 gbps >>> between each host. CephFS is mounted via kernel client. Ceph version >>> is ceph-0.56.3-1. I have a 1GB journal on the same drive as the OSD but >>> on a seperate file system split via LVM. >>> >>> Here is output of another test with fdatasync: >>> >>> [root@triton temp]# dd conv=fdatasync if=/dev/zero of=/mnt/temp/test.out >>> bs=512k count=10000 >>> 10000+0 records in >>> 10000+0 records out >>> 5242880000 bytes (5.2 GB) copied, 359.307 s, 14.6 MB/s >>> [root@triton temp]# dd if=/mnt/temp/test.out of=/dev/null bs=512k >>> count=10000 >>> 10000+0 records in >>> 10000+0 records out >>> 5242880000 bytes (5.2 GB) copied, 14.0521 s, 373 MB/s >>> >> >> Definitely seems off! How many SSDs are involved and how fast are they >> each? The MTU idea might have merit, but I honestly don't know enough >> about how well IPoIB handles giant MTUs like that. One thing I have >> noticed on other IPoIB setups is that TCP autotuning can cause a ton of >> problems. You may want to try disabling it on all of the hosts involved: >> >> echo 0 | tee /proc/sys/net/ipv4/tcp_**moderate_rcvbuf >> >> If that doesn't work, maybe try setting MTU to 9000 or 1500 if possible. >> >> Mark >> >> >> >> >>> The network traffic appears to match the transfer speeds shown here too. >>> Writing is very slow. >>> >>> >>> On Mon, Apr 8, 2013 at 3:04 PM, Mark Nelson <mark.nel...@inktank.com >>> <mailto:mark.nelson@inktank.**com <mark.nel...@inktank.com>>> wrote: >>> >>> Hi, >>> >>> How many drives? Have you tested your IPoIB performance with iperf? >>> Is this CephFS with the kernel client? What version of Ceph? How >>> are your journals configured? etc. It's tough to make any >>> recommendations without knowing more about what you are doing. >>> >>> Also, please use conv=fdatasync when doing buffered IO writes with >>> dd. >>> >>> Thanks, >>> Mark >>> >>> >>> On 04/08/2013 03:00 PM, Ziemowit Pierzycki wrote: >>> >>> Hi, >>> >>> The first test was writing 500 mb file and was clocked at 1.2 >>> GBps. The >>> second test was writing 5000 mb file at 17 MBps. The third test >>> was >>> reading the file at ~400 MBps. >>> >>> >>> On Mon, Apr 8, 2013 at 2:56 PM, Gregory Farnum <g...@inktank.com >>> <mailto:g...@inktank.com> >>> <mailto:g...@inktank.com <mailto:g...@inktank.com>>> wrote: >>> >>> More details, please. You ran the same test twice and >>> performance went >>> up from 17.5MB/s to 394MB/s? How many drives in each node, >>> and of what >>> kind? >>> -Greg >>> Software Engineer #42 @ http://inktank.com | >>> http://ceph.com >>> >>> >>> On Mon, Apr 8, 2013 at 12:38 PM, Ziemowit Pierzycki >>> <ziemo...@pierzycki.com <mailto:ziemo...@pierzycki.com**> >>> <mailto:ziemo...@pierzycki.com >>> >>> <mailto:ziemo...@pierzycki.com**>__>> wrote: >>> > Hi, >>> > >>> > I have a 3 node SSD-backed cluster connected over >>> infiniband (16K >>> MTU) and >>> > here is the performance I am seeing: >>> > >>> > [root@triton temp]# !dd >>> > dd if=/dev/zero of=/mnt/temp/test.out bs=512k count=1000 >>> > 1000+0 records in >>> > 1000+0 records out >>> > 524288000 bytes (524 MB) copied, 0.436249 s, 1.2 GB/s >>> > [root@triton temp]# dd if=/dev/zero >>> of=/mnt/temp/test.out bs=512k >>> > count=10000 >>> > 10000+0 records in >>> > 10000+0 records out >>> > 5242880000 bytes (5.2 GB) copied, 299.077 s, 17.5 MB/s >>> > [root@triton temp]# dd if=/mnt/temp/test.out >>> of=/dev/null bs=512k >>> > count=1000010000+0 records in >>> > 10000+0 records out >>> > 5242880000 bytes (5.2 GB) copied, 13.3015 s, 394 MB/s >>> > >>> > Does that look right? How do I check this is not a >>> network >>> problem, because >>> > I remember seeing a kernel issue related to large MTU. >>> > >>> > ______________________________**___________________ >>> >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com >>> <mailto:ceph-us...@lists.ceph.**com <ceph-users@lists.ceph.com>> >>> <mailto:ceph-us...@lists.ceph.**__com >>> <mailto:ceph-us...@lists.ceph.**com <ceph-users@lists.ceph.com> >>> >> >>> > >>> >>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com> >>> >>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> > >>> >>> > >>> >>> >>> >>> >>> ______________________________**___________________ >>> >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> <mailto:ceph-us...@lists.ceph.**com<ceph-users@lists.ceph.com> >>> > >>> >>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com> >>> >>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> > >>> >>> >>> ______________________________**___________________ >>> >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> <mailto:ceph-us...@lists.ceph.**com<ceph-users@lists.ceph.com> >>> > >>> >>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com> >>> >>> >>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> > >>> >>> >>> >>> >>> ______________________________**_________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>> >>> >> ______________________________**_________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com