also jewel does not supposed to get more 'performance', since it used bluestore in order to store metadata. Or do I need to specify during install to use bluestore?
Thanks, *German* 2016-04-07 16:55 GMT-03:00 Robert LeBlanc <rob...@leblancnet.us>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > Ceph is not able to use native Infiniband protocols yet and so it is > only leveraging IPoIB at the moment. The most likely reason you are > only getting ~10 Gb performance is that IPoIB heavily leverages > multicast in Infiniband (if you do so research in this area you will > understand why unicast IP still uses multicast on an Inifiniband > network). To be extremely compatible with all adapters, the subnet > manager will set the speed of multicast to 10 Gb/s so that SDR > adapters can be used and not drop packets. If you know that you will > never have adapters under a certain speed, you can configure the > subnet manager to use a higher speed. This does not change IPoIB > networks that are already configured (I had to down all the IPoIB > adapter at the same time and bring them back up to upgrade the speed). > Even after that, there still wasn't similar performance to native > Infiniband, but I got at least a 2x improvement (along with setting > the MTU to 64K) on the FDR adapters. There is still a ton of overhead > for doing IPoIB so it is not an ideal transport to get performance on > Infiniband, I think of it as a compatibility feature. Hopefully, that > will give you enough information to perform the research. If you > search the OFED mailing list, you will see some posts from me 2-3 > years ago regarding this very topic. > > Good luck and keep holding out for Ceph with XIO. > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.3.6 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJXBrtICRDmVDuy+mK58QAAqVkP/2hpe93FYIbQtpV4Qta4 > 9Fohqf478kVPX/v6XkAYOlAFFAISxfbDdm0FxOjbGSEOMKGNs/oSaFRCsqb9 > +T5dfMUHyhY51wyaNeVF3k3zgvGpNUO1xEQ1IenUquZp9825VRBze5/T6r8Z > PMFySNtuHBp8AhARisPJcXqKv/Vowfy/LqyvlL6ytIHfwqsVHngbtVN7L/HX > vzMZM93cLwwV44v2bT8t63U76GKyQpbksDx02CktMIFzNbfApsiMaA1dyx1O > 9HEgirtddMO358f+1DN/OjNc/Z3zECILaw3tq/HUWJyBJqO95uBw++znIacb > UKwqJ1HmUeDvdqY72ZQa2fQT7ayMMlPPwzoVtdQGMZnSaAjn8MlunDFCrdLw > +JPT+kt0qnjzs9qK0zEp5drfUwnV5BXS4hZhKUvuxWmVjUv1EfJrIFCszSFO > 2be/xLxqBTpCEcHL9fsc16P7HsrdBW8GDy3X5PC2sOl/2DSes4y2TpCfr7w9 > V8Mhs7mmkEQtwcvyaYQ0bx0Bs3o4cvTTeYbJUpLWEgMmGAEBZbf7Sx+y3dIp > jUHb2jPEchBb83BGeLvAkCTfouq/J3pzQK96gA2Kh/KOlVJTpFdKUU5x+wpM > ACqD+S/AFkgnfGm4fcgBexhro7GImiO6VIaOdxvTSdQbSsaoKckZOxFhVWih > XyBJ > =EF9A > -----END PGP SIGNATURE----- > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Apr 7, 2016 at 1:43 PM, German Anders <gand...@despegar.com> > wrote: > > Hi Cephers, > > > > I've setup a production environment Ceph cluster with the Jewel release > > (10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)) consisting of 3 MON > > Servers and 6 OSD Servers: > > > > 3x MON Servers: > > 2x Intel Xeon E5-2630v3@2.40Ghz > > 384GB RAM > > 2x 200G Intel DC3700 in RAID-1 for OS > > 1x InfiniBand ConnectX-3 ADPT DP > > > > 6x OSD Servers: > > 2x Intel Xeon E5-2650v2@2.60Ghz > > 128GB RAM > > 2x 200G Intel DC3700 in RAID-1 for OS > > 12x 800G Intel DC3510 (osd & journal) on same device > > 1x InfiniBand ConnectX-3 ADPT DP (one port on PUB network and the other > on > > the CLUS network) > > > > ceph.conf file is: > > > > [global] > > fsid = xxxxxxxxxxxxxxxxxxxxxxxxxxx > > mon_initial_members = cibm01, cibm02, cibm03 > > mon_host = xx.xx.xx.1,xx.xx.xx.2,xx.xx.xx.3 > > auth_cluster_required = cephx > > auth_service_required = cephx > > auth_client_required = cephx > > filestore_xattr_use_omap = true > > public_network = xx.xx.16.0/20 > > cluster_network = xx.xx.32.0/20 > > > > [mon] > > > > [mon.cibm01] > > host = cibm01 > > mon_addr = xx.xx.xx.1:6789 > > > > [mon.cibm02] > > host = cibm02 > > mon_addr = xx.xx.xx.2:6789 > > > > [mon.cibm03] > > host = cibm03 > > mon_addr = xx.xx.xx.3:6789 > > > > [osd] > > osd_pool_default_size = 2 > > osd_pool_default_min_size = 1 > > > > ## OSD Configuration ## > > [osd.0] > > host = cibn01 > > public_addr = xx.xx.17.1 > > cluster_addr = xx.xx.32.1 > > > > [osd.1] > > host = cibn01 > > public_addr = xx.xx.17.1 > > cluster_addr = xx.xx.32.1 > > > > ... > > > > > > > > They are all running Ubuntu 14.04.4 LTS. Journals are 5GB partitions on > each > > disk, since all the OSD daemons are SSD disks (Intel DC3510 800G). For > > example: > > > > sdc 8:32 0 745.2G 0 disk > > |-sdc1 8:33 0 740.2G 0 part > > /var/lib/ceph/osd/ceph-0 > > `-sdc2 8:34 0 5G 0 part > > > > The purpose of this cluster will be to serve as a backend storage for > Cinder > > volumes (RBD) and Glance images in an OpenStack cloud, most of the > clusters > > on OpenStack will be non-relational databases like Cassandra with many > > instances each. > > > > All of the nodes of the cluster are running InfiniBand FDR 56Gb/s with > > Mellanox Technologies MT27500 Family [ConnectX-3] adapters. > > > > > > So I assume that performance will be really nice, right?...but.. I'm > getting > > some numbers that I think they could be really more important. > > > > # rados --pool rbd bench 10 write -t 16 > > > > Total writes made: 1964 > > Write size: 4194304 > > Object size: 4194304 > > Bandwidth (MB/sec): 755.435 > > > > Stddev Bandwidth: 90.3288 > > Max bandwidth (MB/sec): 884 > > Min bandwidth (MB/sec): 612 > > Average IOPS: 188 > > Stddev IOPS: 22 > > Max IOPS: 221 > > Min IOPS: 153 > > Average Latency(s): 0.0836802 > > Stddev Latency(s): 0.147561 > > Max latency(s): 1.50925 > > Min latency(s): 0.0192736 > > > > > > Then I connect to another server (this one is running on QDR - so I would > > expect something between 2-3Gb/s), I map a RBD on the host, then create a > > ext4 fs and mount it, and finally run a fio test: > > > > # fio --rw=randwrite --bs=4M --numjobs=8 --iodepth=32 --runtime=22 > > --time_based --size=10G --loops=1 --ioengine=libaio --direct=1 > > --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap > > --group_reporting --exitall --name cephV1 --filename=/mnt/host01v1/test1 > > > > fio-2.1.3 > > Starting 8 processes > > cephIBV1: Laying out IO file(s) (1 file(s) / 10240MB) > > Jobs: 7 (f=7): [wwwwww_w] [100.0% done] [0KB/431.6MB/0KB /s] [0/107/0 > iops] > > [eta 00m:00s] > > cephIBV1: (groupid=0, jobs=8): err= 0: pid=6203: Thu Apr 7 15:24:12 2016 > > write: io=15284MB, bw=676412KB/s, iops=165, runt= 23138msec > > slat (msec): min=1, max=480, avg=46.15, stdev=63.68 > > clat (msec): min=64, max=8966, avg=1459.91, stdev=1252.64 > > lat (msec): min=87, max=8969, avg=1506.06, stdev=1253.63 > > clat percentiles (msec): > > | 1.00th=[ 235], 5.00th=[ 478], 10.00th=[ 611], 20.00th=[ > 766], > > | 30.00th=[ 889], 40.00th=[ 988], 50.00th=[ 1106], 60.00th=[ > 1237], > > | 70.00th=[ 1434], 80.00th=[ 1680], 90.00th=[ 2474], 95.00th=[ > 4555], > > | 99.00th=[ 6915], 99.50th=[ 7439], 99.90th=[ 8291], 99.95th=[ > 8586], > > | 99.99th=[ 8979] > > bw (KB /s): min= 3091, max=209877, per=12.31%, avg=83280.51, > > stdev=35226.98 > > lat (msec) : 100=0.16%, 250=0.97%, 500=4.61%, 750=12.93%, 1000=22.61% > > lat (msec) : 2000=45.04%, >=2000=13.69% > > cpu : usr=0.87%, sys=4.77%, ctx=6803, majf=0, minf=16337 > > IO depths : 1=0.2%, 2=0.4%, 4=0.8%, 8=1.7%, 16=3.3%, 32=93.5%, > >>=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >>=64=0.0% > > complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.2%, 64=0.0%, > >>=64=0.0% > > issued : total=r=0/w=3821/d=0, short=r=0/w=0/d=0 > > > > Run status group 0 (all jobs): > > WRITE: io=15284MB, aggrb=676411KB/s, minb=676411KB/s, maxb=676411KB/s, > > mint=23138msec, maxt=23138msec > > > > Disk stats (read/write): > > rbd1: ios=0/4189, merge=0/26613, ticks=0/2852032, in_queue=2857996, > > util=99.08% > > > > > > Does it look acceptable? I mean for an InfiniBand network, I guess that > > throughput need to be better. How much more can I expect to achieve by > > tuning the servers? The MTU on the OSD servers is: > > > > MTU: 65520 > > Any drop packet found > > txqueuelen:256 > > > > Also I've setup on the openib.conf file: > > ... > > SET_IPOIB_CM=yes > > IPOIB_MTU=65520 > > ... > > > > And on mlnx.conf file: > > ... > > > > options mlx4_core enable_sys_tune=1 > > options mlx4_core log_num_mgm_entry_size=-7 > > > > > > Anyone here with experience on Infiniband setups can give me any hint in > > order to 'improve' performance, I'm getting similar numbers with another > > cluster on a 10GbE network :S > > > > > > Thanks, > > > > German > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com