Re: [ceph-users] Poor performance on all SSD cluster

Mark Nelson Sun, 22 Jun 2014 06:45:06 -0700

On 06/22/2014 02:02 AM, Haomai Wang wrote:

Hi Mark,


Do you enable rbdcache? I test on my ssd cluster(only one ssd), it seemed ok.

dd if=/dev/zero of=test bs=16k count=65536 oflag=direct


82.3MB/s

RBD Cache is definitely going to help in this use case. This test isbasically just sequentially writing a single 16k chunk of data out, oneat a time. IE, entirely latency bound. At least on OSDs backed by XFS,you have to wait for that data to hit the journals of every OSDassociated with the object before the acknowledgement gets sent back tothe client. If you are using the default 4MB block size, you'll hit thesame OSDs over and over again and your other OSDs will sit theretwiddling their thumbs waiting for IO until you hit the next block, butthen it will just be a different set OSDs getting hit. You should beable to verify this by using iostat or collectl or something to look atthe behaviour of the SSDs during the test. Since this is all sequentialthough, switching to buffered IO (ie coalesce IOs at the buffercachelayer) or using RBD cache for direct IO (coalesce IOs below the blockdevice) will dramatically improve things.

The real question here though, is whether or not a synchronous stream ofsequential 16k writes is even remotely close to the IO patterns thatwould be seen in actual use for MySQL. Most likely in actual use you'llbe seeing a big mix of random reads and writes, and hopefully at leastsome parallelism (though this depends on the number of databases, numberof users, and the workload!).

Ceph is pretty good at small random IO with lots of parallelism onspinning disk backed OSDs (So long as you use SSD journals orcontrollers with WB cache). It's much harder to get native-level IOPSrates with SSD backed OSDs though. The latency involved in distributingand processing all of that data becomes a much bigger deal. Having saidthat, we are actively working on improving latency as much as we can. :)


Mark



On Sun, Jun 22, 2014 at 11:50 AM, Mark Kirkwood
<mark.kirkw...@catalyst.net.nz> wrote:

On 22/06/14 14:09, Mark Kirkwood wrote:

Upgrading the VM to 14.04 and restesting the case *without* direct I get:

- 164 MB/s (librbd)
- 115 MB/s (kernel 3.13)

So managing to almost get native performance out of the librbd case. I
tweaked both filestore max and min sync intervals (100 and 10 resp) just to
see if I could actually avoid writing to the spinners while the test was in
progress (still seeing some, but clearly fewer).

However no improvement at all *with* direct enabled. The output of iostat on
the host while the direct test is in progress is interesting:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           11.73    0.00    5.04    0.76    0.00   82.47

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00   11.00     0.00     4.02 749.09
0.14   12.36    0.00   12.36   6.55   7.20
sdb               0.00     0.00    0.00   11.00     0.00     4.02 749.09
0.14   12.36    0.00   12.36   5.82   6.40
sdc               0.00     0.00    0.00  435.00     0.00     4.29 20.21
0.53    1.21    0.00    1.21   1.21  52.80
sdd               0.00     0.00    0.00  435.00     0.00     4.29 20.21
0.52    1.20    0.00    1.20   1.20  52.40

(sda,b are the spinners sdc,d the ssds). Something is making the journal
work very hard for its 4.29 MB/s!

regards

Mark

Leaving
off direct I'm seeing about 140 MB/s (librbd) and 90 MB/s (kernel 3.11
[2]). The ssd's can do writes at about 180 MB/s each... which is
something to look at another day[1].




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Poor performance on all SSD cluster

Reply via email to