On 31/08/14 17:55, Mark Kirkwood wrote:
On 29/08/14 22:17, Sebastien Han wrote:
@Mark thanks trying this :)
Unfortunately using nobarrier and another dedicated SSD for the
journal (plus your ceph setting) didn’t bring much, now I can reach
3,5K IOPS.
By any chance, would it be possible for you to test with a single OSD
SSD?
Funny you should bring this up - I have just updated my home system with
a pair of Crucial m550. So figured I;d try a run with 2x ssd 1 for
journal and 1 for data and 1x ssd (journal + data).
The results were the opposite of what I expected (see below), with 2x
ssd getting about 6K IOPS and 1 x ssd getting 8K IOPS (wtf):
I'm running this on Ubuntu 14.04 + ceph git master from a few days ago:
$ ceph --version
ceph version 0.84-562-g8d40600 (8d406001d9b84d9809d181077c61ad9181934752)
The data partition was created with:
$ sudo mkfs.xfs -f -l lazy-count=1 /dev/sdd4
and mounted via:
$ sudo mount -o nobarrier,allocsize=4096 /dev/sdd4 /ceph2
I've attached my ceph.conf and the fio template FWIW.
2x Crucial m550 (1x journal, 1x data)
rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=5511: Sun Aug 31 17:33:40 2014
write: io=1024.0MB, bw=24694KB/s, iops=6173, runt= 42462msec
slat (usec): min=11, max=4086, avg=51.19, stdev=59.30
clat (msec): min=3, max=24, avg= 9.99, stdev= 1.57
lat (msec): min=3, max=24, avg=10.04, stdev= 1.57
clat percentiles (usec):
| 1.00th=[ 6624], 5.00th=[ 7584], 10.00th=[ 8032], 20.00th=[ 8640],
| 30.00th=[ 9152], 40.00th=[ 9536], 50.00th=[ 9920], 60.00th=[10304],
| 70.00th=[10816], 80.00th=[11328], 90.00th=[11968], 95.00th=[12480],
| 99.00th=[13888], 99.50th=[14528], 99.90th=[17024], 99.95th=[19584],
| 99.99th=[23168]
bw (KB /s): min=23158, max=25592, per=100.00%, avg=24711.65,
stdev=470.72
lat (msec) : 4=0.01%, 10=50.69%, 20=49.26%, 50=0.04%
cpu : usr=25.27%, sys=2.68%, ctx=266729, majf=0, minf=16773
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.3%, 32=83.8%,
>=64=15.8%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=93.8%, 8=2.9%, 16=2.2%, 32=1.0%, 64=0.1%,
>=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
1x Crucial m550 (journal + data)
rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=6887: Sun Aug 31 17:42:22 2014
write: io=1024.0MB, bw=32778KB/s, iops=8194, runt= 31990msec
slat (usec): min=10, max=4016, avg=45.68, stdev=41.60
clat (usec): min=428, max=25688, avg=7658.03, stdev=1600.65
lat (usec): min=923, max=25757, avg=7703.72, stdev=1598.77
clat percentiles (usec):
| 1.00th=[ 3440], 5.00th=[ 5216], 10.00th=[ 6048], 20.00th=[ 6624],
| 30.00th=[ 7008], 40.00th=[ 7328], 50.00th=[ 7584], 60.00th=[ 7904],
| 70.00th=[ 8256], 80.00th=[ 8640], 90.00th=[ 9280], 95.00th=[10048],
| 99.00th=[12864], 99.50th=[14528], 99.90th=[17536], 99.95th=[19328],
| 99.99th=[21888]
bw (KB /s): min=30768, max=35160, per=100.00%, avg=32907.35,
stdev=934.80
lat (usec) : 500=0.01%, 1000=0.01%
lat (msec) : 2=0.04%, 4=1.80%, 10=93.15%, 20=4.97%, 50=0.04%
cpu : usr=32.32%, sys=3.05%, ctx=179657, majf=0, minf=16751
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=59.7%,
>=64=40.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=96.8%, 8=2.6%, 16=0.5%, 32=0.1%, 64=0.1%,
>=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
I'm digging a bit more to try to understand this slightly surprising result.
For that last benchmark I'd used a file rather than a device journal on
the same ssd:
$ ls -l /ceph2
total 15360040
-rw-r--r-- 1 root root 37 Sep 1 12:00 ceph_fsid
drwxr-xr-x 68 root root 4096 Sep 1 12:00 current
-rw-r--r-- 1 root root 37 Sep 1 12:00 fsid
-rw-r--r-- 1 root root 15728640000 Sep 1 12:00 journal
-rw------- 1 root root 56 Sep 1 12:00 keyring
-rw-r--r-- 1 root root 21 Sep 1 12:00 magic
-rw-r--r-- 1 root root 6 Sep 1 12:00 ready
-rw-r--r-- 1 root root 4 Sep 1 12:00 store_version
-rw-r--r-- 1 root root 53 Sep 1 12:00 superblock
-rw-r--r-- 1 root root 2 Sep 1 12:00 whoami
Let's try a more standard device journal on another partition of the
same ssd. 1x Crucial m550 (device journal + data):
$ ls -l /ceph2
total 36
-rw-r--r-- 1 root root 37 Sep 1 12:02 ceph_fsid
drwxr-xr-x 68 root root 4096 Sep 1 12:02 current
-rw-r--r-- 1 root root 37 Sep 1 12:02 fsid
lrwxrwxrwx 1 root root 9 Sep 1 12:02 journal -> /dev/sdd1
-rw------- 1 root root 56 Sep 1 12:02 keyring
-rw-r--r-- 1 root root 21 Sep 1 12:02 magic
-rw-r--r-- 1 root root 6 Sep 1 12:02 ready
-rw-r--r-- 1 root root 4 Sep 1 12:02 store_version
-rw-r--r-- 1 root root 53 Sep 1 12:02 superblock
-rw-r--r-- 1 root root 2 Sep 1 12:02 whoami
rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
fio-2.1.11-20-g9a44
Starting 1 process
rbd_thread: (groupid=0, jobs=1): err= 0: pid=4463: Mon Sep 1 09:16:16 2014
write: io=1024.0MB, bw=22105KB/s, iops=5526, runt= 47436msec
slat (usec): min=11, max=4054, avg=52.66, stdev=62.79
clat (msec): min=3, max=43, avg=11.20, stdev= 1.69
lat (msec): min=4, max=43, avg=11.25, stdev= 1.69
clat percentiles (usec):
| 1.00th=[ 7904], 5.00th=[ 8896], 10.00th=[ 9408], 20.00th=[10048],
| 30.00th=[10432], 40.00th=[10688], 50.00th=[11072], 60.00th=[11456],
| 70.00th=[11712], 80.00th=[12224], 90.00th=[12992], 95.00th=[13888],
| 99.00th=[16768], 99.50th=[17792], 99.90th=[20352], 99.95th=[24960],
| 99.99th=[42240]
bw (KB /s): min=20285, max=23537, per=100.00%, avg=22126.98,
stdev=579.19
lat (msec) : 4=0.01%, 10=20.03%, 20=79.86%, 50=0.11%
cpu : usr=23.48%, sys=2.58%, ctx=302278, majf=0, minf=16786
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.6%, 32=82.8%,
>=64=16.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=93.9%, 8=3.0%, 16=2.0%, 32=1.0%, 64=0.1%,
>=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
So we seem to lose performance a bit there. Finally let's use 2 ssd
again but have a file journal only on the 2nd one. 2x Crucial m550 (1x
file journal, 1x data):
rbd_thread: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
iodepth=64
Starting 1 process
fio-2.1.11-20-g9a44
rbd_thread: (groupid=0, jobs=1): err= 0: pid=6943: Mon Sep 1 11:18:01 2014
write: io=1024.0MB, bw=32248KB/s, iops=8062, runt= 32516msec
slat (usec): min=11, max=4843, avg=45.42, stdev=43.57
clat (usec): min=657, max=22614, avg=7806.70, stdev=1319.02
lat (msec): min=1, max=22, avg= 7.85, stdev= 1.32
clat percentiles (usec):
| 1.00th=[ 4384], 5.00th=[ 5984], 10.00th=[ 6432], 20.00th=[ 6880],
| 30.00th=[ 7200], 40.00th=[ 7520], 50.00th=[ 7776], 60.00th=[ 8032],
| 70.00th=[ 8384], 80.00th=[ 8640], 90.00th=[ 9152], 95.00th=[ 9664],
| 99.00th=[11328], 99.50th=[13376], 99.90th=[17536], 99.95th=[18304],
| 99.99th=[21376]
bw (KB /s): min=30408, max=35320, per=100.00%, avg=32339.56,
stdev=937.80
lat (usec) : 750=0.01%
lat (msec) : 2=0.03%, 4=0.70%, 10=95.96%, 20=3.29%, 50=0.02%
cpu : usr=31.37%, sys=3.42%, ctx=181872, majf=0, minf=16759
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=56.6%,
>=64=43.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=97.1%, 8=2.4%, 16=0.4%, 32=0.1%, 64=0.1%,
>=64=0.0%
issued : total=r=0/w=262144/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
So we are up to 8K IOPS again. Observe we are not maxing out the ssds:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 5048.00 0.00 7550.00 0.00 83.43
22.63 2.80 0.37 0.00 0.37 0.04 31.60
sdc 0.00 0.00 0.00 7145.00 0.00 72.21
20.70 0.27 0.04 0.00 0.04 0.04 26.80
Allegedly this model ssd (128G m550) can do 75K 4k random write IOPS
(running fio on the filesystem I've seen 70K IOPS so is reasonably
believable). So anyway we are not getting anywhere near the max IOPS
from our devices.
We use the Intel S3700 for production ceph servers, so I'll see if we
have any I can test on - would be interesting to see if I find the same
3.5K issue or not.
Cheers
Mark
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com