El 09/07/14 13:10, Mark Nelson escribió:
On 07/09/2014 05:57 AM, Xabier Elkano wrote:
Hi,
I was doing some tests in my cluster with fio tool, one fio instance
with 70 jobs, each job writing 1GB random with 4K block size. I did
this test with 3 variations:
1- Creating 70 images, 60GB each, in the pool. Using rbd kernel
module, format and mount each image as ext4. Each fio job writing in
a separate image/directory. (ioengine=libaio, queue_depth=4,
direct=1)
IOPS: 6542
AVG LAT: 41ms
2- Creating 1 large image 4,2TB in the pool. Using rbd kernel module,
format and mount the image as ext4. Each fio job writing in a
separate file in the same directory. (ioengine=libaio,
queue_depth=4,direct=1)
IOPS: 5899
AVG LAT: 47ms
3- Creating 1 large image 4,2TB in the pool. Use ioengine rbd in fio
to access the image through librados. (ioengine=rbd,
queue_depth=4,direct=1)
IOPS: 2638
AVG LAT: 96ms
Do these results make sense? From Ceph perspective, It is better to
have many small images than a larger one? What is the best approach
to simulate the workload of 70 VMs?
I'm not sure the difference between the first two cases is enough to
say much yet. You may need to repeat the test a couple of times to
ensure that the difference is more than noise. having said that, if
we are seeing an effect, it would be interesting to know what the
latency distribution is like. is it consistently worse in the 2nd
case or do we see higher spikes at specific times?
I've repeated the tests with similar results. Each test is done with a
clean new rbd image, first removing any existing images in the pool and
then creating the new image. Between tests I am running:
echo 3 > /proc/sys/vm/drop_caches
- In the first test I've created 70 images (60G) and mounted them:
/dev/rbd1 on /mnt/fiotest/vtest0
/dev/rbd2 on /mnt/fiotest/vtest1
..
/dev/rbd70 on /mnt/fiotest/vtest69
fio output:
rand-write-4k: (groupid=0, jobs=70): err= 0: pid=21852: Tue Jul 8
14:52:56 2014
write: io=2559.5MB, bw=26179KB/s, iops=6542, runt=100116msec
slat (usec): min=18, max=512646, avg=4002.62, stdev=13754.33
clat (usec): min=867, max=579715, avg=37581.64, stdev=55954.19
lat (usec): min=903, max=586022, avg=41957.74, stdev=59276.40
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 10], 10.00th=[ 13],
20.00th=[ 18], | 30.00th=[ 21], 40.00th=[ 26], 50.00th=[ 31],
60.00th=[ 34], | 70.00th=[ 37], 80.00th=[ 41], 90.00th=[ 48],
95.00th=[ 61], | 99.00th=[ 404], 99.50th=[ 445], 99.90th=[ 494],
99.95th=[ 515], | 99.99th=[ 553]
bw (KB /s): min= 0, max= 694, per=1.46%, avg=383.29,
stdev=148.01 lat (usec) : 1000=0.01%
lat (msec) : 2=0.12%, 4=0.63%, 10=4.82%, 20=22.33%, 50=63.97%
lat (msec) : 100=5.61%, 250=0.47%, 500=2.01%, 750=0.08%
cpu : usr=0.69%, sys=2.57%, ctx=1525021, majf=0, minf=2405
IO depths : 1=1.1%, 2=0.6%, 4=335.8%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
issued : total=r=0/w=655015/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=4
Run status group 0 (all jobs):
WRITE: io=2559.5MB, aggrb=26178KB/s, minb=26178KB/s, maxb=26178KB/s,
mint=100116msec, maxt=100116msec
Disk stats (read/write):
rbd1: ios=0/2408612, merge=0/979004, ticks=0/39436432,
in_queue=39459720, util=99.68%
- In the second test I only created one large image (4,2T)
/dev/rbd1 on /mnt/fiotest/vtest0 type ext4
(rw,noatime,nodiratime,data=ordered)
fio output:
rand-write-4k: (groupid=0, jobs=70): err= 0: pid=8907: Wed Jul 9
13:38:14 2014
write: io=2264.6MB, bw=23143KB/s, iops=5783, runt=100198msec
slat (usec): min=0, max=3099.8K, avg=4131.91, stdev=21388.98
clat (usec): min=850, max=3133.1K, avg=43337.56, stdev=93830.42
lat (usec): min=930, max=3147.5K, avg=48253.22, stdev=100642.53
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 11], 10.00th=[ 14],
20.00th=[ 19], | 30.00th=[ 24], 40.00th=[ 29], 50.00th=[ 33],
60.00th=[ 36], | 70.00th=[ 39], 80.00th=[ 43], 90.00th=[ 51],
95.00th=[ 68], | 99.00th=[ 506], 99.50th=[ 553], 99.90th=[ 717],
99.95th=[ 783], | 99.99th=[ 3130]
bw (KB /s): min= 0, max= 680, per=1.54%, avg=355.39,
stdev=156.10 lat (usec) : 1000=0.01%
lat (msec) : 2=0.12%, 4=0.66%, 10=4.21%, 20=17.82%, 50=66.95%
lat (msec) : 100=7.34%, 250=0.78%, 500=1.10%, 750=0.99%,
1000=0.02% lat (msec) : >=2000=0.04%
cpu : usr=0.65%, sys=2.45%, ctx=1434322, majf=0, minf=2399
IO depths : 1=0.2%, 2=0.1%, 4=365.4%, 8=0.0%, 16=0.0%, 32=0.0%,
=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
issued : total=r=0/w=579510/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=4
Run status group 0 (all jobs):
WRITE: io=2264.6MB, aggrb=23142KB/s, minb=23142KB/s, maxb=23142KB/s,
mint=100198msec, maxt=100198msec
Disk stats (read/write):
rbd1: ios=0/2295106, merge=0/926648, ticks=0/39660664,
in_queue=39706288, util=99.80%
It seems that latency is more stable in the first case.