On 31/08/14 17:55, Mark Kirkwood wrote:
On 29/08/14 22:17, Sebastien Han wrote:

@Mark thanks trying this :)
Unfortunately using nobarrier and another dedicated SSD for the
journal  (plus your ceph setting) didn’t bring much, now I can reach
3,5K IOPS.
By any chance, would it be possible for you to test with a single OSD
SSD?


Funny you should bring this up - I have just updated my home system with
a pair of Crucial m550. So figured I;d try a run with 2x ssd 1 for
journal and 1 for data and 1x ssd (journal + data).


The results were the opposite of what I expected (see below), with 2x
ssd getting about 6K IOPS and 1 x ssd getting 8K IOPS (wtf):

I'm running this on Ubuntu 14.04 + ceph git master from a few days ago:

$ ceph --version
ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)

The data partition was created with:

$ sudo mkfs.xfs -f -l lazy-count=1 /dev/sdd4

and mounted via:

$ sudo mount -o nobarrier,allocsize=4096 /dev/sdd4 /ceph2


I've attached my ceph.conf and the fio template FWIW.

2x Crucial m550 (1x journal, 1x data)

rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=5511: Sun Aug 31 17:33:40 2014
   write: io=1024.0MB, bw=24694KB/s, iops=6173, runt= 42462msec
     slat (usec): min=11, max=4086, avg=51.19, stdev=59.30
     clat (msec): min=3, max=24, avg= 9.99, stdev= 1.57
      lat (msec): min=3, max=24, avg=10.04, stdev= 1.57
     clat percentiles (usec):
      |  1.00th=[ 6624],  5.00th=[ 7584], 10.00th=[ 8032], 20.00th=[ 8640],
      | 30.00th=[ 9152], 40.00th=[ 9536], 50.00th=[ 9920], 60.00th=[10304],
      | 70.00th=[10816], 80.00th=[11328], 90.00th=[11968], 95.00th=[12480],
      | 99.00th=[13888], 99.50th=[14528], 99.90th=[17024], 99.95th=[19584],
      | 99.99th=[23168]
     bw (KB  /s): min=23158, max=25592, per=100.00%, avg=24711.65,
stdev=470.72
     lat (msec) : 4=0.01%, 10=50.69%, 20=49.26%, 50=0.04%
   cpu          : usr=25.27%, sys=2.68%, ctx=266729, majf=0, minf=16773
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=83.8%,
 >=64=15.8%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 >=64=0.0%
      complete  : 0=0.0%, 4=93.8%, 8=2.9%, 16=2.2%, 32=1.0%, 64=0.1%,
 >=64=0.0%
      issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=64

1x Crucial m550 (journal + data)

rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=6887: Sun Aug 31 17:42:22 2014
   write: io=1024.0MB, bw=32778KB/s, iops=8194, runt= 31990msec
     slat (usec): min=10, max=4016, avg=45.68, stdev=41.60
     clat (usec): min=428, max=25688, avg=7658.03, stdev=1600.65
      lat (usec): min=923, max=25757, avg=7703.72, stdev=1598.77
     clat percentiles (usec):
      |  1.00th=[ 3440],  5.00th=[ 5216], 10.00th=[ 6048], 20.00th=[ 6624],
      | 30.00th=[ 7008], 40.00th=[ 7328], 50.00th=[ 7584], 60.00th=[ 7904],
      | 70.00th=[ 8256], 80.00th=[ 8640], 90.00th=[ 9280], 95.00th=[10048],
      | 99.00th=[12864], 99.50th=[14528], 99.90th=[17536], 99.95th=[19328],
      | 99.99th=[21888]
     bw (KB  /s): min=30768, max=35160, per=100.00%, avg=32907.35,
stdev=934.80
     lat (usec) : 500=0.01%, 1000=0.01%
     lat (msec) : 2=0.04%, 4=1.80%, 10=93.15%, 20=4.97%, 50=0.04%
   cpu          : usr=32.32%, sys=3.05%, ctx=179657, majf=0, minf=16751
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=59.7%,
 >=64=40.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
 >=64=0.0%
      complete  : 0=0.0%, 4=96.8%, 8=2.6%, 16=0.5%, 32=0.1%, 64=0.1%,
 >=64=0.0%
      issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
      latency   : target=0, window=0, percentile=100.00%, depth=64





I'm digging a bit more to try to understand this slightly surprising result.

For that last benchmark I'd used a file rather than a device journal on the same ssd:

$ ls -l /ceph2
total 15360040
-rw-r--r--  1 root root          37 Sep  1 12:00 ceph_fsid
drwxr-xr-x 68 root root        4096 Sep  1 12:00 current
-rw-r--r--  1 root root          37 Sep  1 12:00 fsid
-rw-r--r--  1 root root 15728640000 Sep  1 12:00 journal
-rw-------  1 root root          56 Sep  1 12:00 keyring
-rw-r--r--  1 root root          21 Sep  1 12:00 magic
-rw-r--r--  1 root root           6 Sep  1 12:00 ready
-rw-r--r--  1 root root           4 Sep  1 12:00 store_version
-rw-r--r--  1 root root          53 Sep  1 12:00 superblock
-rw-r--r--  1 root root           2 Sep  1 12:00 whoami


Let's try a more standard device journal on another partition of the same ssd. 1x Crucial m550 (device journal + data):

$ ls -l /ceph2
total 36
-rw-r--r--  1 root root   37 Sep  1 12:02 ceph_fsid
drwxr-xr-x 68 root root 4096 Sep  1 12:02 current
-rw-r--r--  1 root root   37 Sep  1 12:02 fsid
lrwxrwxrwx  1 root root    9 Sep  1 12:02 journal -> /dev/sdd1
-rw-------  1 root root   56 Sep  1 12:02 keyring
-rw-r--r--  1 root root   21 Sep  1 12:02 magic
-rw-r--r--  1 root root    6 Sep  1 12:02 ready
-rw-r--r--  1 root root    4 Sep  1 12:02 store_version
-rw-r--r--  1 root root   53 Sep  1 12:02 superblock
-rw-r--r--  1 root root    2 Sep  1 12:02 whoami


rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=4463: Mon Sep  1 09:16:16 2014
  write: io=1024.0MB, bw=22105KB/s, iops=5526, runt= 47436msec
    slat (usec): min=11, max=4054, avg=52.66, stdev=62.79
    clat (msec): min=3, max=43, avg=11.20, stdev= 1.69
     lat (msec): min=4, max=43, avg=11.25, stdev= 1.69
    clat percentiles (usec):
     |  1.00th=[ 7904],  5.00th=[ 8896], 10.00th=[ 9408], 20.00th=[10048],
     | 30.00th=[10432], 40.00th=[10688], 50.00th=[11072], 60.00th=[11456],
     | 70.00th=[11712], 80.00th=[12224], 90.00th=[12992], 95.00th=[13888],
     | 99.00th=[16768], 99.50th=[17792], 99.90th=[20352], 99.95th=[24960],
     | 99.99th=[42240]
bw (KB /s): min=20285, max=23537, per=100.00%, avg=22126.98, stdev=579.19
    lat (msec) : 4=0.01%, 10=20.03%, 20=79.86%, 50=0.11%
  cpu          : usr=23.48%, sys=2.58%, ctx=302278, majf=0, minf=16786
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.6%, 32=82.8%, >=64=16.6% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=93.9%, 8=3.0%, 16=2.0%, 32=1.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

So we seem to lose performance a bit there. Finally let's use 2 ssd again but have a file journal only on the 2nd one. 2x Crucial m550 (1x file journal, 1x data):

rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=64
Starting 1 process
fio-2.1.11-20-g9a44

rbd_thread: (groupid=0, jobs=1): err= 0: pid=6943: Mon Sep  1 11:18:01 2014
  write: io=1024.0MB, bw=32248KB/s, iops=8062, runt= 32516msec
    slat (usec): min=11, max=4843, avg=45.42, stdev=43.57
    clat (usec): min=657, max=22614, avg=7806.70, stdev=1319.02
     lat (msec): min=1, max=22, avg= 7.85, stdev= 1.32
    clat percentiles (usec):
     |  1.00th=[ 4384],  5.00th=[ 5984], 10.00th=[ 6432], 20.00th=[ 6880],
     | 30.00th=[ 7200], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 8032],
     | 70.00th=[ 8384], 80.00th=[ 8640], 90.00th=[ 9152], 95.00th=[ 9664],
     | 99.00th=[11328], 99.50th=[13376], 99.90th=[17536], 99.95th=[18304],
     | 99.99th=[21376]
bw (KB /s): min=30408, max=35320, per=100.00%, avg=32339.56, stdev=937.80
    lat (usec) : 750=0.01%
    lat (msec) : 2=0.03%, 4=0.70%, 10=95.96%, 20=3.29%, 50=0.02%
  cpu          : usr=31.37%, sys=3.42%, ctx=181872, majf=0, minf=16759
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=56.6%, >=64=43.3% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=97.1%, 8=2.4%, 16=0.4%, 32=0.1%, 64=0.1%, >=64=0.0%
     issued    : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

So we are up to 8K IOPS again. Observe we are not maxing out the ssds:

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 5048.00 0.00 7550.00 0.00 83.43 22.63 2.80 0.37 0.00 0.37 0.04 31.60 sdc 0.00 0.00 0.00 7145.00 0.00 72.21 20.70 0.27 0.04 0.00 0.04 0.04 26.80

Allegedly this model ssd (128G m550) can do 75K 4k random write IOPS (running fio on the filesystem I've seen 70K IOPS so is reasonably believable). So anyway we are not getting anywhere near the max IOPS from our devices.

We use the Intel S3700 for production ceph servers, so I'll see if we have any I can test on - would be interesting to see if I find the same 3.5K issue or not.

Cheers

Mark


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to