Re: [ceph-users] Bad cluster benchmark results

Christian Balzer Wed, 01 Oct 2014 19:25:37 -0700

Hello,

On Wed, 1 Oct 2014 14:43:49 -0700 Jakes John wrote:


> Hi Ceph users,
>                         I am stuck with the benchmark results that I
> obtained from the ceph cluster.
> 
> Ceph Cluster:
> 
> 1 Mon node, 4 osd nodes of 1 TB. I have one journal for each osd.
> 
> All disks are identical and nodes are connected by 10 G.  Below is the dd
> results
> 
> 
> dd if=/dev/zero of=/home/ubuntu/deleteme bs=10G count=1 oflag=direct
> 0+1 records in
> 0+1 records out
> 2147479552 bytes (2.1 GB) copied, 17.0705 s, 126 MB/s
>
That's for one disk done locally I presume? 
Note that with a bs of 10G you're really comparing apples to oranges later
on of course.
 
> 
> I created 1 osd(xfs) on each node as below.
> 
> mkfs.xfs /dev/sdo1
> mount /dev/sdo1 /node/nodeo
> 
> sudo mkfs.xfs /dev/sdp1
> 
> ceph-deploy osd prepare mynode:/node/nodeo:/dev/sdp1
> ceph-deploy osd activate mynode:/node/nodeo:/dev/sdp1
> 
> Now, when I run rados bechmarks, I am just getting ~4 MB/s for writes and
> ~40 Mbps for reads. What am I doing wrong?.
Nothing really.

> I have seen Christian's post regarding the block sizes and parallelism.
> My benchmark arguments seem to be right.
>
You're testing with 4k blocks, which are still quite small in the Ceph
world, the default (with no -b parameter) is 4MB!

If I use your parameters, I can get about 8MB/s from my cluster with 8
OSDs per node and 4 SSDs for journals, connected by Infiniband. 
So don't feel bad. ^o^
Using the default 4MB block size, I get 600MB/s.
 
> Replica size of test-pool - 2
> No of pgs: 256
> 
> rados -p test-pool bench 120 write -b 4096 -t 16 --no-cleanup
> 
> Total writes made:      245616
> Write size:             4096
> Bandwidth (MB/sec):     3.997
> 
> Stddev Bandwidth:       2.19989
> Max bandwidth (MB/sec): 8.46094
> Min bandwidth (MB/sec): 0
> Average Latency:        0.0156332
> Stddev Latency:         0.0460168
> Max latency:            2.94882

This suggests to me that at one point your disks were the bottlenecks,
probably due to the journals being on the same device.

Always run atop (as it covers nearly all the bases) on all your OSD nodes
when doing tests, you will see when disks are bottlenecks and you might
find that with certain operations CPU usage spikes so much it might be the
culprit. 

> Min latency:            0.001725
> 
> 
> rados -p test-pool bench 120 seq -t 16 --no-cleanup
> 
> 
> Total reads made:     245616
> Read size:            4096
> Bandwidth (MB/sec):    40.276
> 
> Average Latency:       0.00155048
> Max latency:           3.25052
> Min latency:           0.000515
> 

I don't know the intimate inner details of Ceph, but I assume this is
because things were written with 4KB blocks and I can certainly reproduce
this behavior and results on my "fast" cluster. Also looking at atop, it
gets VERY busy CPU wise at that time, also suggesting it has to deal with
lots of little transactions.

Doing the rados bench with the default 4MB block size (no -b parameter) I
also get 600MB/s read performance.


Some general observation about what to expect for write

Lets do some very simplified calculations here:
1. Your disks can write about 120MB/s individually. Now that are sequential
writes you tested, Ceph writes 4MB blobs into a filesystem and thus has way
more overhead and will be significantly slower.
2. You have on disk journals, thus halving your base disk speed, meaning a
drive can now at best write about 60MB/s. 
3. And a replication of 2, potentially halving speeds again. 

So the base speed of your cluster is about 120MB/s, about the same as a
single drive. And these are non-sequential writes spread over a network
(which IS slower than local writes).

On my crappy test cluster I can't get much over 40MB/s and it incidentally
also has 4 OSDs with on disk journals as well.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
ch...@gol.com           Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bad cluster benchmark results

Reply via email to