On 07/23/2014 03:54 AM, Andrei Mikhailovsky wrote:
Ricardo,
Thought to share my testing results.
I've been using IPoIB with ceph for quite some time now. I've got QDR
osd/mon/client servers to serve rbd images to kvm hypervisor. I've done
some performance testing using both rados and guest vm benchmarks while
running the last three stable versions of ceph.
My conclusion was that ceph itself needs to mature and/or be optimised
in order to utilise the capabilities of the infiniband link. In my
experience, I was not able to reach the limits of the network speeds
reported to me by the network performance monitoring tools. I was
struggling to push data throughput beyond 1.5GB/s while using between 2
and 64 concurrent tests. This was the case when the benchmark data was
using the same data over and over again and the data was cached on the
osd servers and was coming directly from server's ram without any access
to the osds themselves.
My ipoib network performance tests were showing on average 2.5-3GB/s
with peaks reaching 3.3GB/s over ipoib. It would be nice to see how ceph
is performing over rdma ))).
Having said this, perhaps my test gear is somewhat limited or my ceph
optimisation was not done correctly. I had 2 osd servers with 8 osds
each and three clients running guest vms and rados benchmarks. None of
the benchmarks were able to fully utilise the server resources. my osd
servers were running on about 50% utilisation during the tests.
So, I had to conclude that unless you are running a large cluster with
some specific data sets that utilise multithreading you will probably
not need to have an infiniband link. A single thread performance for the
cold data will be limited to about 1/2 of the speed of a single osd
device. So, if your osds are running 150MB/s do not expect to have a
single thread faster than 70-80MB/s.
On the other hand, if you utilise high performance gear, like cache
cards capable of achieving speeds of over gigabytes per second, perhaps
infiniband link might be of use. Not sure if the ceph-osd process is
capable of "spitting" out this amount of data though. You might be
having a CPU bottleneck.
FWIW, when we were testing Ceph with QDR IB at ORNL, we topped out at
around 2GB/s per server node with IPoIB. This was with a rather
unconventional setup though with a DDN SFA10K and RAID5 LUNs with lots
of disks per OSD. On my (more conventional) high performance test box,
I can hit 2GB/s with 24 disks, 8 ssds, and 4 SAS2308 controllers, at
least when streaming 4MB objects in and out of rados. I suspect for
most people 10GbE will be fast enough for many workloads (though QDR IB
might be cheaper if you know how to implement it!)
Andrei
------------------------------------------------------------------------
*From: *"Sage Weil" <sw...@redhat.com>
*To: *"Riccardo Murri" <riccardo.mu...@uzh.ch>
*Cc: *ceph-users@lists.ceph.com
*Sent: *Tuesday, 22 July, 2014 9:42:56 PM
*Subject: *Re: [ceph-users] Ceph and Infiniband
On Tue, 22 Jul 2014, Riccardo Murri wrote:
> Hello,
>
> a few questions on Ceph's current support for Infiniband
>
> (A) Can Ceph use Infiniband's native protocol stack, or must it use
> IP-over-IB? Google finds a couple of entries in the Ceph wiki related
> to native IB support (see [1], [2]), but none of them seems finished
> and there is no timeline.
>
> [1]:
https://wiki.ceph.com/Planning/Blueprints/Emperor/msgr%3A_implement_infiniband_support_via_rsockets
> [2]:
http://wiki.ceph.com/Planning/Blueprints/Giant/Accelio_RDMA_Messenger
This is work in progress. We hope to get basic support into the tree
in the next couple of months.
> (B) Can we connect to the same Ceph cluster from Infiniband *and*
> Ethernet? Some clients do only have Ethernet and will not be
> upgraded, some others would have QDR Infiniband -- we would like both
> sets to access the same storage cluster.
This is further out. Very early refactoring to make this work in
wip-addr.
> (C) I found this old thread about Ceph's performance on 10GbE and
> Infiniband: are the issues reported there still current?
>
> http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/6816
No idea! :)
sage
>
>
> Thanks for any hint!
>
> Riccardo
>
> --
> Riccardo Murri
> http://www.s3it.uzh.ch/about/team/
>
> S3IT: Services and Support for Science IT
> University of Zurich
> Winterthurerstrasse 190, CH-8057 Z?rich (Switzerland)
> Tel: +41 44 635 4222
> Fax: +41 44 635 6888
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com