Re: [ceph-users] how to improve performance

Sébastien VIGNERON Mon, 20 Nov 2017 04:14:15 -0800

As a jumbo frame test, can you try the following?

ping -M do -s 8972 -c 4 IP_of_other_node_within_cluster_network


If you have « ping: sendto: Message too long », jumbo frames are not activated.

Cordialement / Best regards,

Sébastien VIGNERON 
CRIANN, 
Ingénieur / Engineer
Technopôle du Madrillet 
745, avenue de l'Université 
76800 Saint-Etienne du Rouvray - France 
tél. +33 2 32 91 42 91 
fax. +33 2 32 91 42 92 
http://www.criann.fr 
mailto:sebastien.vigne...@criann.fr
support: supp...@criann.fr

> Le 20 nov. 2017 à 13:02, Rudi Ahlers <rudiahl...@gmail.com> a écrit :
> 
> We're planning on installing 12X Virtual Machines with some heavy loads. 
> 
> the SSD drives are  INTEL SSDSC2BA400G4
> 
> The SATA drives are ST8000NM0055-1RM112
> 
> Please explain your comment, "b) will find a lot of people here who don't 
> approve of it."
> 
> I don't have access to the switches right now, but they're new so whatever 
> default config ships from factory would be active. Though iperf shows 10.5 
> GBytes  / 9.02 Gbits/sec throughput.
> 
> What speeds would you expect?
> "Though with your setup I would have expected something faster, but NOT the
> theoretical 600MB/s 4 HDDs will do in sequential writes."
> 
> 
> 
> On this, "If an OSD has no fast WAL/DB, it will drag the overall speed down. 
> Verify and if so fix this and re-test.": how?
> 
> 
> On Mon, Nov 20, 2017 at 1:44 PM, Christian Balzer <ch...@gol.com 
> <mailto:ch...@gol.com>> wrote:
> On Mon, 20 Nov 2017 12:38:55 +0200 Rudi Ahlers wrote:
> 
> > Hi,
> >
> > Can someone please help me, how do I improve performance on ou CEPH cluster?
> >
> > The hardware in use are as follows:
> > 3x SuperMicro servers with the following configuration
> > 12Core Dual XEON 2.2Ghz
> Faster cores is better for Ceph, IMNSHO.
> Though with main storage on HDDs, this will do.
> 
> > 128GB RAM
> Overkill for Ceph but I see something else below...
> 
> > 2x 400GB Intel DC SSD drives
> Exact model please.
> 
> > 4x 8TB Seagate 7200rpm 6Gbps SATA HDD's
> One hopes that's a non SMR one.
> Model please.
> 
> > 1x SuperMicro DOM for Proxmox / Debian OS
> Ah, Proxmox.
> I'm personally not averse to converged, high density, multi-role clusters
> myself, but you:
> a) need to know what you're doing and
> b) will find a lot of people here who don't approve of it.
> 
> I've avoided DOMs so far (non-hotswapable SPOF), even though the SM ones
> look good on paper with regards to endurance and IOPS.
> The later being rather important for your monitors.
> 
> > 4x Port 10Gbe NIC
> > Cisco 10Gbe switch.
> >
> Configuration would be nice for those, LACP?
> 
> >
> > root@virt2:~# rados bench -p Data 10 write --no-cleanup
> > hints = 1
> > Maintaining 16 concurrent writes of 4194304 bytes to objects of size
> > 4194304 for       up to 10 seconds or 0 objects
> 
> rados bench is limited tool and measuring bandwidth is in nearly all
> the use cases pointless.
> Latency is where it is at and testing from inside a VM is more relevant
> than synthetic tests of the storage.
> But it is a start.
> 
> > Object prefix: benchmark_data_virt2_39099
> >   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> > lat(s)
> >     0       0         0         0         0         0           -
> >  0
> >     1      16        85        69   275.979       276    0.185576
> > 0.204146
> >     2      16       171       155   309.966       344   0.0625409
> > 0.193558
> >     3      16       243       227   302.633       288   0.0547129
> >  0.19835
> >     4      16       330       314   313.965       348   0.0959492
> > 0.199825
> >     5      16       413       397   317.565       332    0.124908
> > 0.196191
> >     6      16       494       478   318.633       324      0.1556
> > 0.197014
> >     7      15       591       576   329.109       392    0.136305
> > 0.192192
> >     8      16       670       654   326.965       312   0.0703808
> > 0.190643
> >     9      16       757       741   329.297       348    0.165211
> > 0.192183
> >    10      16       828       812   324.764       284   0.0935803
> > 0.194041
> > Total time run:         10.120215
> > Total writes made:      829
> > Write size:             4194304
> > Object size:            4194304
> > Bandwidth (MB/sec):     327.661
> What part of this surprises you?
> 
> With a replication of 3, you have effectively the bandwidth of your 2 SSDs
> (for small writes, not the case here) and the bandwidth of your 4 HDDs
> available.
> Given overhead, other inefficiencies and the fact that this is not a
> sequential write from the HDD perspective, 320MB/s isn't all that bad.
> Though with your setup I would have expected something faster, but NOT the
> theoretical 600MB/s 4 HDDs will do in sequential writes.
> 
> > Stddev Bandwidth:       35.8664
> > Max bandwidth (MB/sec): 392
> > Min bandwidth (MB/sec): 276
> > Average IOPS:           81
> > Stddev IOPS:            8
> > Max IOPS:               98
> > Min IOPS:               69
> > Average Latency(s):     0.195191
> > Stddev Latency(s):      0.0830062
> > Max latency(s):         0.481448
> > Min latency(s):         0.0414858
> > root@virt2:~# hdparm -I /dev/sda
> >
> >
> >
> > root@virt2:~# ceph osd tree
> > ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
> > -1       72.78290 root default
> > -3       29.11316     host virt1
> >  1   hdd  7.27829         osd.1      up  1.00000 1.00000
> >  2   hdd  7.27829         osd.2      up  1.00000 1.00000
> >  3   hdd  7.27829         osd.3      up  1.00000 1.00000
> >  4   hdd  7.27829         osd.4      up  1.00000 1.00000
> > -5       21.83487     host virt2
> >  5   hdd  7.27829         osd.5      up  1.00000 1.00000
> >  6   hdd  7.27829         osd.6      up  1.00000 1.00000
> >  7   hdd  7.27829         osd.7      up  1.00000 1.00000
> > -7       21.83487     host virt3
> >  8   hdd  7.27829         osd.8      up  1.00000 1.00000
> >  9   hdd  7.27829         osd.9      up  1.00000 1.00000
> > 10   hdd  7.27829         osd.10     up  1.00000 1.00000
> >  0              0 osd.0            down        0 1.00000
> >
> >
> > root@virt2:~# ceph -s
> >   cluster:
> >     id:     278a2e9c-0578-428f-bd5b-3bb348923c27
> >     health: HEALTH_OK
> >
> >   services:
> >     mon: 3 daemons, quorum virt1,virt2,virt3
> >     mgr: virt1(active)
> >     osd: 11 osds: 10 up, 10 in
> >
> >   data:
> >     pools:   1 pools, 512 pgs
> >     objects: 6084 objects, 24105 MB
> >     usage:   92822 MB used, 74438 GB / 74529 GB avail
> >     pgs:     512 active+clean
> >
> > root@virt2:~# ceph -w
> >   cluster:
> >     id:     278a2e9c-0578-428f-bd5b-3bb348923c27
> >     health: HEALTH_OK
> >
> >   services:
> >     mon: 3 daemons, quorum virt1,virt2,virt3
> >     mgr: virt1(active)
> >     osd: 11 osds: 10 up, 10 in
> >
> >   data:
> >     pools:   1 pools, 512 pgs
> >     objects: 6084 objects, 24105 MB
> >     usage:   92822 MB used, 74438 GB / 74529 GB avail
> >     pgs:     512 active+clean
> >
> >
> > 2017-11-20 12:32:08.199450 mon.virt1 [INF] mon.1 10.10.10.82:6789/0 
> > <http://10.10.10.82:6789/0>
> >
> >
> >
> > The SSD drives are used as journal drives:
> >
> Bluestore has no journals, don't confuse it and the people you're asking
> for help.
> 
> > root@virt3:~# ceph-disk list | grep /dev/sde | grep osd
> >  /dev/sdb1 ceph data, active, cluster ceph, osd.8, block /dev/sdb2,
> > block.db /dev/sde1
> > root@virt3:~# ceph-disk list | grep /dev/sdf | grep osd
> >  /dev/sdc1 ceph data, active, cluster ceph, osd.9, block /dev/sdc2,
> > block.db /dev/sdf1
> >  /dev/sdd1 ceph data, active, cluster ceph, osd.10, block /dev/sdd2,
> > block.db /dev/sdf2
> >
> >
> >
> > I see now /dev/sda doesn't have a journal, though it should have. Not sure
> > why.
> If an OSD has no fast WAL/DB, it will drag the overall speed down.
> 
> Verify and if so fix this and re-test.
> 
> Christian
> 
> > This is the command I used to create it:
> >
> >
> >  pveceph createosd /dev/sda -bluestore 1  -journal_dev /dev/sde
> >
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com <mailto:ch...@gol.com>           Rakuten Communications
> 
> 
> 
> -- 
> Kind Regards
> Rudi Ahlers
> Website: http://www.rudiahlers.co.za <http://www.rudiahlers.co.za/>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to improve performance

Reply via email to