Re: [ceph-users] Interpretation Guidance for Slow Requests

Christian Theune Wed, 07 Dec 2016 03:40:09 -0800

Hi,

I’m now working with the raw device and getting interesting results.


For one, I went through all reviews about the Micron DC S610 again and as 
always the devil is in the detail. I noticed that the test results are quite 
favorable, but I didn’t previously notice the caveat (which applies to SSDs in 
general) that precondition may be in order.

See 
http://www.storagereview.com/seagate_12002_micron_s600dc_enterprise_sas_ssd_review
 
<http://www.storagereview.com/seagate_12002_micron_s600dc_enterprise_sas_ssd_review>

The Micron in their tests shows quite extreme initial max latency until 
preconditioning settles.

I can relate to that as the SSDs that I put into the cluster last Friday (5 
days ago) have quite different characteristics in my statistics compared to the 
ones I added this Monday evening (2 days ago). 

I took one of the early ones and evacuated the OSD to perform tests. 
Sebastian’s fio call for testing journal ability ended up like this at the 
current time:

| cartman06 ~ # fio --filename=/dev/sdl --direct=1 --sync=1 --rw=write 
--bs=128k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting 
--name=journal-test
| journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
| fio-2.0.14
| Starting 1 process
| Jobs: 1 (f=1): [W] [100.0% done] [0K/88852K/0K /s] [0 /22.3K/0  iops] [eta 
00m:00s]
| journal-test: (groupid=0, jobs=1): err= 0: pid=28606: Wed Dec  7 11:59:36 2016
|   write: io=5186.7MB, bw=88517KB/s, iops=22129 , runt= 60001msec
|     clat (usec): min=37 , max=1519 , avg=43.77, stdev=10.89
|      lat (usec): min=37 , max=1519 , avg=43.94, stdev=10.90
|     clat percentiles (usec):
|      |  1.00th=[   39],  5.00th=[   40], 10.00th=[   40], 20.00th=[   41],
|      | 30.00th=[   41], 40.00th=[   42], 50.00th=[   42], 60.00t848/h=[   42],
|      | 70.00th=[   43], 80.00th=[   44], 90.00th=[   47], 95.00th=[   53],
|      | 99.00th=[   71], 99.50th=[   87], 99.90th=[  157], 99.95th=[  201],
|      | 99.99th=[  478]
|     bw (KB/s)  : min=81096, max=91312, per=100.00%, avg=88519.19, 
stdev=1762.43
|     lat (usec) : 50=92.42%, 100=7.28%, 250=0.27%, 500=0.02%, 750=0.01%
|     lat (usec) : 1000=0.01%
|     lat (msec) : 2=0.01%
|   cpu          : usr=5.43%, sys=14.64%, ctx=1327888, majf=0, minf=6
|   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
|      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
|      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
|      issued    : total=r=0/w=1327777/d=0, short=r=0/w=0/d=0
| 
| Run status group 0 (all jobs):
|   WRITE: io=5186.7MB, aggrb=88516KB/s, minb=88516KB/s, maxb=88516KB/s, 
mint=60001msec, maxt=60001msec
| 
| Disk stats (read/write):
|   sdl: ios=15/1326283, merge=0/0, ticks=1/47203, in_queue=46970, util=78.29%

That doesn’t look too bad to me, specifically the 99.99th of 478 microseconds 
seems fine.

The iostat during this run looks OK as well:

| cartman06 ~ # iostat -x 5 sdl
| Linux 4.4.27-gentoo (cartman06)     12/07/2016  _x86_64_    (24 CPU)
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            5.70    0.05    3.09    5.09    0.00   86.07
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     2.92   31.24  148.16  1851.99  8428.66   114.61    
 1.61    9.00    0.68   10.75   0.22   4.03
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            3.31    0.04    1.97    1.48    0.00   93.19
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00    
 0.00    0.00    0.00    0.00   0.00   0.00
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.02    0.03    2.38    1.44    0.00   92.13
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00   12.40 3101.40    92.80 12405.60     8.03    
 0.11    0.04    0.10    0.04   0.04  11.12
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.02    0.05    3.57    4.78    0.00   87.58
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22166.20     0.00 88664.80     8.00   
  0.80    0.04    0.00    0.04   0.04  79.58
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            3.64    0.05    2.77    4.98    0.00   88.56
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22304.20     0.00 89216.80     8.00   
  0.78    0.04    0.00    0.04   0.04  78.08
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.89    0.05    2.97   11.15    0.00   80.93
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22022.00     0.00 88088.00     8.00   
  0.79    0.04    0.00    0.04   0.04  78.68
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            3.45    0.04    2.74    4.24    0.00   89.53
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22182.60     0.00 88730.40     8.00   
  0.78    0.04    0.00    0.04   0.04  77.66
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.21    0.04    2.51    3.40    0.00   89.83
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22392.00     0.00 89568.00     8.00   
  0.79    0.04    0.00    0.04   0.04  79.26
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.94    0.04    3.35    3.40    0.00   88.26
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22078.40     0.00 88313.60     8.00   
  0.79    0.04    0.00    0.04   0.04  78.70
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.43    0.04    3.02    4.68    0.00   87.83
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22141.60     0.00 88566.40     8.00   
  0.77    0.04    0.00    0.04   0.03  77.24
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.16    0.04    2.82    4.66    0.00   88.32
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22177.00     0.00 88708.00     8.00   
  0.78    0.04    0.00    0.04   0.04  78.24
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.09    0.03    3.02   12.34    0.00   80.52
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22156.60     0.00 88626.40     8.00   
  0.78    0.04    0.00    0.04   0.04  78.36
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            5.43    0.04    3.38    4.07    0.00   87.08
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 22298.80     0.00 89195.20     8.00   
  0.77    0.03    0.00    0.03   0.03  77.36
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            7.33    0.05    4.42    4.58    0.00   83.62
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00    0.00 21905.20     0.00 87620.80     8.00   
  0.79    0.04    0.00    0.04   0.04  79.20
| 
| avg-cpu:  %user   %nice %system %iowait  %steal   %idle
|            4.91    0.03    3.52    3.39    0.00   88.15
| 
| Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
| sdl               0.00     0.00   12.40 18629.40    92.80 74517.60     8.00   
  0.67    0.04    0.10    0.04   0.04  67.18


I’m now running  fio --filename=/dev/sdl  --rw=write --bs=128k --numjobs=1 
--iodepth=32 --group_reporting --name=journal-test to condition the device 
fully. After that I’ll perform some more tests based on mixed loads.

> On 7 Dec 2016, at 12:20, Christian Balzer <ch...@gol.com> wrote:
> 
> I wasn't talking about abandoning Ceph up there, just that for this unit
> (storage and VMs) a freeze might be the better, safer option.
> The way it's operated makes that a possibility, others will of course
> want/need to upgrade their clusters and keep them running as indefinitely
> as possible.

I read your comment from a while ago about us all requiring some level of 
“insanity” to run mission critical OSS storage. ;)

> Yup, something I did for our stuff (and not just Ceph SSDs) as well,
> there's a nice Nagios plugin for this.

I’ll see if I can get this into our collectd somehow to feed into our graphing.

> Could be housekeeping, could be pagecache flushes or other XFS ops.
> Probably best to test/compare with a standalone SSD.

Hmm. I’ll see when I introduce additional abstraction layers: raw device, lvm, 
files on xfs. After that maybe also a mixture of two concurrently running fios 
one for files on XFS (OSD) and one for writing to the journal LVM.

> If it were hooked up to the backplane (expander or individual connectors
> per drive?) with just one link/lane (6Gb/s) that would indeed be a
> noticeable bottleneck.
> But I have a hard time imagining that.
> 
> If it were with just one mini-SAS connector aka 4 lanes to an expander
> port it would halve your potential bandwidth but still be more than what
> you're currently likely to produce there.

It’s a far fetch but then again it’s a really big list of (some very small) 
things that can go wrong to screw this everything up at the VM level … 

Cheers,
Christian

-- 
Christian Theune · c...@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Interpretation Guidance for Slow Requests

Reply via email to