Hi folks,

After some more tests. I can not addressed the bottleneck currently. Never
hit CPU bound.


OSD op threads : 60
rgw_thread_pool_size : 300
pg = 2000
pool size = 3

Try to find the max concurrency of 1KB write of this cluster
Rados bench 1000~5000+ about 3000 reqs/sec

mon.0 [INF]  2053 kB/s wr, 2053 op/s
mon.0 [INF]  2093 kB/s wr, 2093 op/s
mon.0 [INF]  3380 kB/s wr, 3380 op/s

So that I think the IOPS for single pool is around 3000 op/s



With 30 OSD and journals on 3 SSD by ssbench

1KB Result in : 1200 reqs/sec

4MB object get 150MB/sec


With 30 OSD and journals on 30 HDDs

1KB ssbench Result in : 450 reqs/sec

mon.0 [INF] 1650 kB/s rd, 413 kB/s wr, 5783 op/s
mon.0 [INF] 1651 kB/s rd, 409 kB/s wr, 5760 op/s
mon.0 [INF] 1708 kB/s rd, 423 kB/s wr, 5959 op/s


1KB Rados Bench result in : 900 reqs/sec

mon.0 [INF] 803 kB/s wr, 803 op/s
mon.0 [INF] 806 kB/s wr, 806 op/s
mon.0 [INF] 911 kB/s wr, 911 op/s


          4MB object get 350MB/s throughPUT

Based on the above result, SSD helps for small object but not that good for
object size which over 1MB.


Why 1KB object benchmark from ssbench generated much more ops than rados
bench?
>From my perspective,

1. Every request is producing auth/bucket/object-put operation from RadosGW
to Rados.
2. Need to read bucket data


How to improve the performance(?) :

1. Higher concurrency will reduce the performance of RadosGW :

    Cuncurrency 100

Count: 13283 (    0 error;     0 retries:  0.00%)  Average requests per
second: 436.1
mon.0 [INF] 1650 kB/s rd, 413 kB/s wr, 5783 op/s
mon.0 [INF] 1651 kB/s rd, 409 kB/s wr, 5760 op/s
mon.0 [INF] 1708 kB/s rd, 423 kB/s wr, 5959 op/s


    Concurrency 200

Count:  7027 (   17 error;   475 retries:  6.76%)  Average requests per
second: 190.0
mon.0 [INF] 2001 kB/s rd, 492 kB/s wr, 6959 op/s
mon.0 [INF] 1877 kB/s rd, 457 kB/s wr, 6498 op/s
mon.0 [INF] 1332 kB/s rd, 330 kB/s wr, 4647 op/s

    Higher concurrency will have more error and retries.
    I'm not sure the bottleneck is on Http server or Rados cluster maximum
IOPS in this case.
    Does any chance to make it faster by tuning Apache's setting? The CPU
util on this node

2. For hitting network maximum bandwidth, more HDDs or journal with SSD
will help?


Any suggestion would be appreciate ~






2013/12/24 Kuo Hugo <tonyt...@gmail.com>

> Hi folks,
>
>
> There're 30 HDDs on three 24 threads severs. Each has 2 10G NICs. one for
> public and one for cluster . A dedicated 32threads server for RadosGW.
>
> My setting is to achieve same availability as Swift. So that the pool
> size=3 anf min_size=2.  for all RadosGW related pools. Each pool's pg is
> set to 2000.
>
> Everything is working well but performance.
>
> Round1) Journals all a SSD with 10 partitions on each server.
>
> It's faster for small object(1KB). 1100reqs/sec under concurrency=100.
> But there's a problem, the total throughPUT has only 150MB/sec.
>
>
> Round2) Journals on HDDs itself
>
> Better throughPU in this way. The Rados Bench shows 300~400MB/sec.
> But the 1KB reqs/sec is really bad about 400reqs/sec.
>
>
> And ..... the reqs/sec reduced along with the number of concurrency.
> For example 500 concurrency can only handle 120reqs/sec.
>
> Dose anyone use RadosGW for high concurrency cases in real?
> Could you please let me know which http server are you running for RadosGW
> ?
> How will you leverage all these equipments for building a most efficiency
> Rados+RadosGW cluster with Swift API ?
>
> For reference, with same HW and similar setup, Swift can get 1600reqs/sec
> with 1000 concurrency.
>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to