I enabled logging and performed same tests.
Here is the link on archive with logs, they are only from one node
(from the node where active MDS was sitting):
https://www.dropbox.com/s/80axovtoofesx5e/logs.tar.gz?dl=0
Rados bench results:
# rados bench -p test 10 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 10
seconds or 0 objects
Object prefix: benchmark_data_atl-fs11_4630
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 46 30 119.967 120 0.201327 0.348463
2 16 88 72 143.969 168 0.132983 0.353677
3 16 124 108 143.972 144 0.930837 0.383018
4 16 155 139 138.976 124 0.899468 0.426396
5 16 203 187 149.575 192 0.236534 0.400806
6 16 243 227 151.309 160 0.835213 0.397673
7 16 276 260 148.549 132 0.905989 0.406849
8 16 306 290 144.978 120 0.353279 0.422106
9 16 335 319 141.757 116 1.12114 0.428268
10 16 376 360 143.98 164 0.418921 0.43351
11 16 377 361 131.254 4 0.499769 0.433693
Total time run: 11.206306
Total writes made: 377
Write size: 4194304
Bandwidth (MB/sec): 134.567
Stddev Bandwidth: 60.0232
Max bandwidth (MB/sec): 192
Min bandwidth (MB/sec): 0
Average Latency: 0.474923
Stddev Latency: 0.376038
Max latency: 1.82171
Min latency: 0.060877
# rados bench -p test 10 seq
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 61 45 179.957 180 0.010405 0.25243
2 16 109 93 185.962 192 0.908263 0.284303
3 16 151 135 179.965 168 0.255312 0.297283
4 16 191 175 174.97 160 0.836727 0.330659
5 16 236 220 175.971 180 0.009995 0.330832
6 16 275 259 172.639 156 1.06855 0.345418
7 16 311 295 168.545 144 0.907648 0.361689
8 16 351 335 167.474 160 0.947688 0.363552
9 16 390 374 166.196 156 0.140539 0.369057
Total time run: 9.755367
Total reads made: 401
Read size: 4194304
Bandwidth (MB/sec): 164.422
Average Latency: 0.387705
Max latency: 1.33852
Min latency: 0.008064
# rados bench -p test 10 rand
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
0 0 0 0 0 0 - 0
1 16 55 39 155.938 156 0.773716 0.257267
2 16 93 77 153.957 152 0.006573 0.339199
3 16 135 119 158.629 168 0.009851 0.359675
4 16 171 155 154.967 144 0.892027 0.359015
5 16 209 193 154.369 152 1.13945 0.378618
6 16 256 240 159.97 188 0.009965 0.368439
7 16 295 279 159.4 156 0.195812 0.371259
8 16 343 327 163.472 192 0.880587 0.370759
9 16 380 364 161.75 148 0.113111 0.377983
10 16 424 408 163.173 176 0.772274 0.379497
Total time run: 10.518482
Total reads made: 425
Read size: 4194304
Bandwidth (MB/sec): 161.620
Average Latency: 0.393978
Max latency: 1.36572
Min latency: 0.006448
On Tue, Oct 21, 2014 at 2:03 PM, Gregory Farnum <[email protected]> wrote:
> Can you enable debugging on the client ("debug ms = 1", "debug client
> = 20") and mds ("debug ms = 1", "debug mds = 20"), run this test
> again, and post them somewhere for me to look at?
>
> While you're at it, can you try rados bench and see what sort of
> results you get?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Oct 21, 2014 at 10:57 AM, Sergey Nazarov <[email protected]> wrote:
>> It is CephFS mounted via ceph-fuse.
>> I am getting the same results not depending on how many other clients
>> are having this fs mounted and their activity.
>> Cluster is working on Debian Wheezy, kernel 3.2.0-4-amd64.
>>
>> On Tue, Oct 21, 2014 at 1:44 PM, Gregory Farnum <[email protected]> wrote:
>>> Are these tests conducted using a local fs on RBD, or using CephFS?
>>> If CephFS, do you have multiple clients mounting the FS, and what are
>>> they doing? What client (kernel or ceph-fuse)?
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>
>>> On Tue, Oct 21, 2014 at 9:05 AM, Sergey Nazarov <[email protected]>
>>> wrote:
>>>> Hi
>>>>
>>>> I just built a new cluster using this quickstart instructions:
>>>> http://ceph.com/docs/master/start/
>>>>
>>>> And here is what I am seeing:
>>>>
>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>> real 0m0.081s
>>>> user 0m0.000s
>>>> sys 0m0.004s
>>>>
>>>> And if I try to repeat the same command (when files already created):
>>>>
>>>> # time for i in {1..10}; do echo $i > $i.txt ; done
>>>> real 0m48.894s
>>>> user 0m0.000s
>>>> sys 0m0.004s
>>>>
>>>> I was very surprised and then just tried to rewrite a single file:
>>>>
>>>> # time echo 1 > 1.txt
>>>> real 0m3.133s
>>>> user 0m0.000s
>>>> sys 0m0.000s
>>>>
>>>> BTW, I dont think it is the problem with OSD speed or network:
>>>>
>>>> # time sysbench --num-threads=1 --test=fileio --file-total-size=1G
>>>> --file-test-mode=rndrw prepare
>>>> 1073741824 bytes written in 23.52 seconds (43.54 MB/sec).
>>>>
>>>> Here is my ceph cluster status and verion:
>>>>
>>>> # ceph -w
>>>> cluster d3dcacc3-89fb-4db0-9fa9-f1f6217280cb
>>>> health HEALTH_OK
>>>> monmap e4: 4 mons at
>>>> {atl-fs10=10.44.101.70:6789/0,atl-fs11=10.44.101.91:6789/0,atl-fs12=10.44.101.92:6789/0,atl-fs9=10.44.101.69:6789/0},
>>>> election epoch 40, quorum 0,1,2,3 atl-fs9,atl-fs10,atl-fs11,atl-fs12
>>>> mdsmap e33: 1/1/1 up {0=atl-fs12=up:active}, 3 up:standby
>>>> osdmap e92: 4 osds: 4 up, 4 in
>>>> pgmap v8091: 192 pgs, 3 pools, 123 MB data, 1658 objects
>>>> 881 GB used, 1683 GB / 2564 GB avail
>>>> 192 active+clean
>>>> client io 1820 B/s wr, 1 op/s
>>>>
>>>> # ceph -v
>>>> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
>>>>
>>>> All nodes connected with gigabit network.
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com