Thanks for the response, Yehuda.
[ more text below ]
On 2014年08月05日 05:33, Yehuda Sadeh wrote:
On Fri, Aug 1, 2014 at 9:49 AM, Osier Yang <agedos...@gmail.com> wrote:
[ correct the URL ]
On 2014年08月02日 00:42, Osier Yang wrote:
Hi, list,
I managed to setup radosgw in testing environment to see if it's
stable/mature enough
for production use these several days. In the meanwhile, I tried to read
the source code
of radosgw to understand how it actually manages the underlying storage.
The testing result shows the the write performance to a bucket is not
good, as far as I
understood from the code, it's caused by there is only *one* bucket index
object for a
single bucket, which is not nice in principle. And moreover, requests to
the whole bucket
could be blocked if the corresponding bucket index object happens to be in
recovering or
backfilling process. This is not acceptable in production use. Although I
saw Guang Yang
did some work (the prototype patches [1]) to try to resolve the problem
with the bucket
index sharding, I'm not quite confident about if it could solve the
problem from root,
since it looks like radosgw is trying to manage millions or billions
objects in one bucket
with the index, I'm a bit worried about it even the index sharding is
supported.
Another problem I encounted is: when I upgraded radosgw to latest version
(Firefly),
radosgw-admin works well, read request works well too, but all write
request fails. Note
that I didn't do any changes on the config files, it means there is some
compactibilties
problems (client in new version fails to talk with ceph cluster in old
version). The error
looks like:
2014-07-31 10:13:10.045921 7fdb40ddd700 0 ERROR: can't read user header:
ret=-95
2014-07-31 10:13:10.045930 7fdb40ddd700 0 ERROR: sync_user() failed,
user=osier ret=-95
2014-07-31 17:00:56.075066 7fe514fe6780 0 ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6), process radosgw, pid 19974
2014-07-31 17:00:56.197659 7fe514fe6780 0 framework: fastcgi
2014-07-31 17:00:56.197666 7fe514fe6780 0 starting handler: fastcgi
2014-07-31 17:00:56.198941 7fe4f8ff9700 0 ERROR: FCGX_Accept_r returned -9
2014-07-31 17:00:56.211176 7fe4f9ffb700 0 ERROR: can't read user header:
ret=-95
2014-07-31 17:00:56.211197 7fe4f9ffb700 0 ERROR: sync_user() failed,
user=Bob Dylon ret=-95
2014-07-31 17:00:56.212306 7fe4f9ffb700 0 ERROR: can't read user header:
ret=-95
2014-07-31 17:00:56.212325 7fe4f9ffb700 0 ERROR: sync_user() failed,
user=osier ret=-95
Did you upgrade the osds? Did you restart the osds after upgrade?
No, I didn't upgrade osds, and didn't restart osds. What I did is
simply using
newer version radosgw against the ceph cluster which still is using the old
version.
So it sounds like using newer version radosgw requires newer osd?
With these two experience, I was starting to think about if radosgw is
stable/mature
enough yet. It seems that dreamhost is the only one using radosgw for
service, though
it seems there are use cases in private environments from google. I have
no way to
demonstrate if it's stable and mature enough for production use except
trying to understand
how it works, however, I guess everybody knows it will be too hard to go
back if a distributed
system is already in production use. So I'm asking here to see if I could
get some advices/
thoughts/suggestions from who already managed to setup radosgw for
production use.
In case of the mail is long/boring enough, I'm submarizing my questions
here:
1) Is radosgw stable/mature enough for production use?
We consider it stable and mature for production use.
2) How it behaves in performance (especially on writing) in practice?
Different use cases and patterns have different performance
characteristics. As you mentioned, objects going to the same bucket
will contend on the bucket index. In the future we will be able to
shard that and it will mitigate the problem a bit. Other ideas are to
drop the bucket index altogether for use cases where object listing is
not really needed.
Bucket listing is important for us too. Disabling it will just make me
crazy. :-)
I did the performance testing yesterday, I'd like share the result here:
1) Tesing environment
ceph cluster: 3 nodes, 1 monitor and 2 osd on each node. These 3 nodes
are relatively cheap PC (I even don't want to mention the description of
their
CPU and memory here).
ceph version: Emperor
radosgw version: Emperor too (I didn't managed to successfully test with
newer radosgw).
radosgw instance: 1; VM; memory/4G; CPU/1
Client: VM; memory/1G, CPU/1
Internal network bandwidth: 1g
I would execute testing command on the "client vm"; And since the testing
environment is far more worse than production environment, the testing
result somehow will be meaningless, so I would do testing with both
"rest-bench"
against radosgw and "rados bench", thus the result could tell how much
performance eaten up by radosgw (mainly the single bucket index object).
2) radosgw config (only the ones which *might* affect performace are listed)
rgw thread pool size = 1000
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
# Operation logs are disabled
#rgw enable ops log = true
#rgw ops log rados = true
#debug rgw = 20
#debug ms = 1
rgw cache lru size = 100000 # the default is 10000
3) Testing command:
root@testing-bob:~# rados --cluster=s3test0 -p osier_test bench 20
write -b $size -t 50
root@testing-bob:~# rest-bench --api-host=testing-s3gw0
--access-key=L6K3FF1OOXO4EY1FH9RF
--secret="/pYIF3jc3NSkVCWPklSM+BIf7IVr74MSnSvbc4Ac" --protocol=http
--uri_style=path --bucket=bob0 --seconds=20 --concurrent-ios=50
--block-size=$size --show-time write
As above command tells, I'm testing with 50 cocurrent threads in 20s for
both
"rest-bench" and "rados bench".
4) Testing result
NOTE:
* The data size I'm using for testing is: 10Bytes; 1Kib; 10Kib;
100Kib; 500Kib;
1Mib; 5Mib; 20Mib; and 40Mib
* Pay attention to the "finished" operations, instead of "Total
writes made"; And
the total time, since "rest-bench" doesn't always ends up in 20
seconds.
* As the testing result shows: for small data writes, the maximum
performance
lost with radosgw is nearly 80%, comparing with writes with rados
directly.
* With the data size growing up, the performance lost is decreased,
the best
result is 50% lost. However, when the data size grows up to some
point (see
40Mib testing result), we can see clearly that the writes are
serialized, and
the performance lost is huge then.
* As a summary, the single bucket index object affects the
performance a lot.
[ 10Bytes ]
== rados ==
2014-08-04 22:34:04.151231min lat: 0.008503 max lat: 1.72593 avg lat:
0.197445
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 4944 4894 0.002333140.000705719 0.250708
0.197445
Total time run: 20.646440
Total writes made: 4945
Write size: 10
Bandwidth (MB/sec): 0.002
Stddev Bandwidth: 0.00130826
Max bandwidth (MB/sec): 0.00367165
Min bandwidth (MB/sec): 0
Average Latency: 0.208751
Stddev Latency: 0.223234
Max latency: 1.72593
Min latency: 0.008503
== radosgw ==
2014-08-04 23:51:41.327835min lat: 0.055295 max lat: 3.28076 avg lat:
1.20371
2014-08-04 23:51:41.327835 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 23:51:41.327835 20 50 849
7990.0003807580.000324249 1.51293 1.20371
2014-08-04 23:51:42.328136 21 50 850
8000.0003630869.53674e-06 3.06026 1.20603
2014-08-04 23:51:43.328345 22 50 850
8000.000346588 0 - 1.20603
2014-08-04 23:51:44.328556 23 50 850
8000.000331524 0 - 1.20603
2014-08-04 23:51:45.328769 24 50 850
8000.000317716 0 - 1.20603
2014-08-04 23:51:46.328989 25 50 850
8000.000305011 0 - 1.20603
2014-08-04 23:51:47.329214 26 49 850
8010.000293651.90735e-06 6.33663 1.21244
2014-08-04 23:51:48.329488 Total time run: 26.759887
Total writes made: 850
Write size: 10
Bandwidth (MB/sec): 0.000
Stddev Bandwidth: 0.000185797
Max bandwidth (MB/sec): 0.000543594
Min bandwidth (MB/sec): 0
Average Latency: 1.56037
Stddev Latency: 1.54032
Max latency: 9.433
Min latency: 0.055295
[ 1Kib ]
== rados ==
2014-08-04 22:38:12.770177min lat: 0.006787 max lat: 2.03145 avg lat:
0.191323
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 5196 5146 0.251217 0.0810547 0.951518 0.191323
Total time run: 20.694827
Total writes made: 5197
Write size: 1024
Bandwidth (MB/sec): 0.245
Stddev Bandwidth: 0.209637
Max bandwidth (MB/sec): 0.999023
Min bandwidth (MB/sec): 0
Average Latency: 0.199098
Stddev Latency: 0.263302
Max latency: 2.03145
Min latency: 0.006787
== radosgw ==
2014-08-04 22:39:16.448663min lat: 0.058305 max lat: 5.84678 avg lat:
1.55047
2014-08-04 22:39:16.448663 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 22:39:16.448663 20 50 666 616 0.0300611
0.0449219 0.611505 1.55047
2014-08-04 22:39:17.448976 21 45 667 622
0.02890880.00585938 1.15587 1.54779
2014-08-04 22:39:18.449173 22 45 667 622
0.0275952 0 - 1.54779
2014-08-04 22:39:19.449449 Total time run: 22.759544
Total writes made: 667
Write size: 1024
Bandwidth (MB/sec): 0.029
Stddev Bandwidth: 0.0214045
Max bandwidth (MB/sec): 0.0722656
Min bandwidth (MB/sec): 0
Average Latency: 1.69066
Stddev Latency: 1.11007
Max latency: 5.89554
Min latency: 0.058305
[ 10Kib ]
== rados ==
2014-08-04 22:41:01.756019min lat: 0.003658 max lat: 1.33439 avg lat:
0.1979
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 5075 5025 2.45309 2.94922 0.13717 0.1979
Total time run: 20.132932
Total writes made: 5076
Write size: 10240
Bandwidth (MB/sec): 2.462
Stddev Bandwidth: 1.57143
Max bandwidth (MB/sec): 6.03516
Min bandwidth (MB/sec): 0
Average Latency: 0.198299
Stddev Latency: 0.206342
Max latency: 1.33439
Min latency: 0.003658
== radosgw ==
2014-08-04 23:59:08.103196min lat: 0.081382 max lat: 4.20567 avg lat:
1.21503
2014-08-04 23:59:08.103196 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 23:59:08.103196 20 50 832 782 0.381632
0.273438 1.16601 1.21503
2014-08-04 23:59:09.103480 21 47 833 786 0.365322
0.0390625 3.06067 1.21989
2014-08-04 23:59:10.103669 22 45 833 788 0.34961
0.0195312 1.74456 1.22298
2014-08-04 23:59:11.103882 23 45 833 788
0.334413 0 - 1.22298
2014-08-04 23:59:12.104123 Total time run: 23.028529
Total writes made: 833
Write size: 10240
Bandwidth (MB/sec): 0.353
Stddev Bandwidth: 0.189902
Max bandwidth (MB/sec): 0.546875
Min bandwidth (MB/sec): 0
Average Latency: 1.36834
Stddev Latency: 0.93226
Max latency: 5.58653
Min latency: 0.081382
[ 100Kib ]
== rados ==
2014-08-04 22:43:12.878546min lat: 0.00586 max lat: 1.92724 avg lat:
0.215224
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 4600 4550 22.212 19.3359 0.111257 0.215224
Total time run: 20.668664
Total writes made: 4601
Write size: 102400
Bandwidth (MB/sec): 21.739
Stddev Bandwidth: 13.2295
Max bandwidth (MB/sec): 60.0586
Min bandwidth (MB/sec): 0
Average Latency: 0.224462
Stddev Latency: 0.226179
Max latency: 1.92724
Min latency: 0.00586
== radosgw ==
2014-08-04 23:54:52.136557min lat: 0.121387 max lat: 5.76303 avg lat:
1.35267
2014-08-04 23:54:52.136557 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 23:54:52.136557 20 50 699 649 3.16721
0.341797 2.90547 1.35267
2014-08-04 23:54:53.136860 21 44 700 656 3.04896
0.683594 3.0446 1.37228
2014-08-04 23:54:54.137117 Total time run: 21.477508
Total writes made: 700
Write size: 102400
Bandwidth (MB/sec): 3.183
Stddev Bandwidth: 2.13648
Max bandwidth (MB/sec): 7.03125
Min bandwidth (MB/sec): 0
Average Latency: 1.5243
Stddev Latency: 1.20656
Max latency: 5.76303
Min latency: 0.121387
[ 500Kib ]
== rados ==
2014-08-04 22:45:55.736963min lat: 0.028845 max lat: 3.00344 avg lat:
0.386006
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 1816 1766 43.1048 0 - 0.386006
21 50 1816 1766 41.0521 0 - 0.386006
22 50 1816 1766 39.1861 0 - 0.386006
23 50 1816 1766 37.4825 0 - 0.386006
24 50 1817 1767 35.9411 0.0697545 11.6738 0.392395
Total time run: 24.548976
Total writes made: 1817
Write size: 512000
Bandwidth (MB/sec): 36.140
Stddev Bandwidth: 34.3547
Max bandwidth (MB/sec): 80.0781
Min bandwidth (MB/sec): 0
Average Latency: 0.675502
Stddev Latency: 1.76753
Max latency: 12.673
Min latency: 0.028845
== radosgw ==
2014-08-04 22:46:51.406535min lat: 0.199692 max lat: 6.7487 avg lat:
1.95867
2014-08-04 22:46:51.406535 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 22:46:51.406535 20 50 490 440 10.7355
6.83594 4.42465 1.95867
2014-08-04 22:46:52.406851 21 50 491 441 10.2476
0.488281 4.87118 1.96528
2014-08-04 22:46:53.407047 22 50 491 441
9.78203 0 - 1.96528
2014-08-04 22:46:54.407243 23 50 491 441
9.35689 0 - 1.96528
2014-08-04 22:46:55.407438 24 50 491 441
8.96716 0 - 1.96528
2014-08-04 22:46:56.407725 Total time run: 24.562034
Total writes made: 491
Write size: 512000
Bandwidth (MB/sec): 9.761
Stddev Bandwidth: 7.0541
Max bandwidth (MB/sec): 23.9258
Min bandwidth (MB/sec): 0
Average Latency: 2.48753
Stddev Latency: 2.01542
Max latency: 10.482
Min latency: 0.199692
[ 1Mib ]
== rados ==
2014-08-04 22:48:05.348669min lat: 0.059332 max lat: 5.30887 avg lat:
0.895115
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 50 1099 1049 51.2076 11.7188 3.94854 0.895115
Total time run: 20.426123
Total writes made: 1100
Write size: 1024000
Bandwidth (MB/sec): 52.590
Stddev Bandwidth: 32.4589
Max bandwidth (MB/sec): 87.8906
Min bandwidth (MB/sec): 0
Average Latency: 0.927253
Stddev Latency: 0.945177
Max latency: 5.30887
Min latency: 0.059332
== radosgw ==
2014-08-05 00:01:57.779506min lat: 0.291824 max lat: 11.4166 avg lat:
3.84634
2014-08-05 00:01:57.779506 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-05 00:01:57.779506 20 50 266 216
10.5389 0 - 3.84634
2014-08-05 00:01:58.779846 21 50 266 216
10.0373 0 - 3.84634
2014-08-05 00:01:59.780046 22 49 267 218 9.66998
0.651042 5.49671 3.86582
2014-08-05 00:02:00.780275 23 49 267 218
9.24974 0 - 3.86582
2014-08-05 00:02:01.780540 Total time run: 23.846098
Total writes made: 267
Write size: 1024000
Bandwidth (MB/sec): 10.934
Stddev Bandwidth: 8.56096
Max bandwidth (MB/sec): 35.1562
Min bandwidth (MB/sec): 0
Average Latency: 4.44906
Stddev Latency: 2.73214
Max latency: 11.4166
Min latency: 0.291824
[ 5Mib ]
== rados ==
2014-08-04 22:50:20.362025min lat: 1.95706 max lat: 8.57053 avg lat:
3.77608
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 49 290 241 58.8244 107.422 2.18729 3.77608
21 17 291 274 63.6944 161.133 1.15035 3.50209
22 17 291 274 60.7992 0 - 3.50209
23 15 291 276 58.5804 4.88281 3.83261 3.50451
Total time run: 23.359546
Total writes made: 291
Write size: 5120000
Bandwidth (MB/sec): 60.827
Stddev Bandwidth: 52.3299
Max bandwidth (MB/sec): 161.133
Min bandwidth (MB/sec): 0
Average Latency: 3.51905
Stddev Latency: 1.97284
Max latency: 8.57053
Min latency: 0.995771
== radosgw ==
2014-08-05 00:03:36.219328min lat: 4.10734 max lat: 16.7196 avg lat:
8.66002
2014-08-05 00:03:36.219328 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-05 00:03:36.219328 20 50 138 88 21.4675
9.76562 7.59011 8.66002
2014-08-05 00:03:37.219679 21 50 138 88
20.4456 0 - 8.66002
2014-08-05 00:03:38.219884 22 50 139 89 19.7386
2.44141 12.5832 8.70411
2014-08-05 00:03:39.220072 23 50 139 89
18.8808 0 - 8.70411
2014-08-05 00:03:40.220275 24 50 139 89
18.0945 0 - 8.70411
2014-08-05 00:03:41.220484 25 50 139 89
17.3711 0 - 8.70411
2014-08-05 00:03:42.220688 26 44 139 95 17.8293
7.32422 8.48089 8.92694
2014-08-05 00:03:43.220929 Total time run: 26.178957
Total writes made: 139
Write size: 5120000
Bandwidth (MB/sec): 25.926
Stddev Bandwidth: 18.0919
Max bandwidth (MB/sec): 68.3594
Min bandwidth (MB/sec): 0
Average Latency: 9.39576
Stddev Latency: 3.19278
Max latency: 18.8883
Min latency: 4.10734
[ 20Mib ]
== rados ==
2014-08-04 22:52:59.584809min lat: 9999 max lat: 0 avg lat: 0
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
20 34 34 0 0 0 - 0
21 36 36 0 0 0 - 0
22 38 38 0 0 0 - 0
23 39 39 0 0 0 - 0
24 41 41 0 0 0 - 0
25 42 42 0 0 0 - 0
26 42 42 0 0 0 - 0
27 43 43 0 0 0 - 0
28 46 46 0 0 0 - 0
29 48 48 0 0 0 - 0
30 41 50 9 5.81711 5.85938 28.2801 29.4592
31 4 50 46 28.7793 722.656 3.37337 18.5771
32 1 50 49 29.7045 58.5938 3.48328 17.6497
Total time run: 32.246896
Total writes made: 50
Write size: 20480000
Bandwidth (MB/sec): 30.284
Stddev Bandwidth: 125.863
Max bandwidth (MB/sec): 722.656
Min bandwidth (MB/sec): 0
Average Latency: 17.3433
Stddev Latency: 9.29014
Max latency: 30.1162
Min latency: 2.33119
== radosgw ==
2014-08-04 22:54:03.379562min lat: 13.3435 max lat: 19.4876 avg lat:
17.8951
2014-08-04 22:54:03.379562 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 22:54:03.379562 20 49 64 15 14.5565
136.719 19.4876 17.8951
2014-08-04 22:54:04.379954 21 50 65 15
13.8672 0 - 17.8951
2014-08-04 22:54:05.380137 22 50 65 15
13.2404 0 - 17.8951
2014-08-04 22:54:06.380336 23 50 65 15
12.6677 0 - 17.8951
2014-08-04 22:54:07.380551 24 50 65 15
12.1426 0 - 17.8951
2014-08-04 22:54:08.380742 25 50 65 15
11.6593 0 - 17.8951
2014-08-04 22:54:09.380915 26 50 65 15
11.2129 0 - 17.8951
2014-08-04 22:54:10.381107 27 50 65 15
10.7995 0 - 17.8951
2014-08-04 22:54:11.381314 28 50 65 15
10.4155 0 - 17.8951
2014-08-04 22:54:12.381502 29 50 65 15
10.0579 0 - 17.8951
2014-08-04 22:54:13.381701 30 48 65 17 11.0205
3.90625 29.8248 19.2985
2014-08-04 22:54:14.381900 31 47 65 18 11.2938
19.5312 16.1941 19.1261
2014-08-04 22:54:15.382104 32 47 65 18
10.9422 0 - 19.1261
2014-08-04 22:54:16.538449 Total time run: 32.852659
Total writes made: 65
Write size: 20480000
Bandwidth (MB/sec): 38.643
Stddev Bandwidth: 24.8996
Max bandwidth (MB/sec): 136.719
Min bandwidth (MB/sec): 0
Average Latency: 24.9568
Stddev Latency: 8.35865
Max latency: 32.807
Min latency: 13.2119
[ 40Mib ]
== rados ==
2014-08-04 22:56:17.539373min lat: 9999 max lat: 0 avg lat: 0
sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
40 49 49 0 0 0 - 0
41 49 49 0 0 0 - 0
42 1 50 49 45.5638 45.5729 3.07695 21.9831
Total time run: 42.303319
Total writes made: 50
Write size: 40960000
Bandwidth (MB/sec): 46.170
Stddev Bandwidth: 6.9498
Max bandwidth (MB/sec): 45.5729
Min bandwidth (MB/sec): 0
Average Latency: 21.6088
Stddev Latency: 12.1152
Max latency: 41.2854
Min latency: 3.07695
== radosgw ==
2014-08-04 23:04:17.740359min lat: 9999 max lat: 0 avg lat: 0
2014-08-04 23:04:17.740359 sec Cur ops started finished avg MB/s
cur MB/s last lat avg lat
2014-08-04 23:04:17.740359 40 50 50 0 0
0 - 0
2014-08-04 23:04:18.740650 41 50 50 0 0
0 - 0
2014-08-04 23:04:19.740852 42 50 50 0 0
0 - 0
2014-08-04 23:04:20.741063 43 50 50 0 0
0 - 0
2014-08-04 23:04:21.741239 44 50 50 0 0
0 - 0
2014-08-04 23:04:22.742059 45 49 51 2 1.73223
1.73611 44.3911 44.3332
2014-08-04 23:04:23.742235 46 49 51 2
1.69465 0 - 44.3332
2014-08-04 23:04:24.742429 47 49 51 2
1.65866 0 - 44.3332
2014-08-04 23:04:25.742675 Total time run: 47.742303
Total writes made: 51
Write size: 40960000
Bandwidth (MB/sec): 41.728
Stddev Bandwidth: 0.250586
Max bandwidth (MB/sec): 1.73611
Min bandwidth (MB/sec): 0
Average Latency: 46.4062
Stddev Latency: 6.17721
Max latency: 47.7395
Min latency: 3.36783
Regards,
Osier
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com