[ceph-users] Re: RGW/multisite sync traffic rps

Szabo, Istvan (Agoda) Fri, 22 Oct 2021 14:51:25 -0700

I see the same issue (45k GET requests constantly as admin), what my guess is, 
the primary site is putting to the datalog the changes and the secondary sites 
are pulling these logs as it changes.
Do you have user who constantly uploading, deleting?


Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com<mailto:istvan.sz...@agoda.com>
---------------------------------------------------

On 2021. Oct 22., at 10:46, Stefan Schueffler <s.schueff...@softgarden.de> 
wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !
________________________________

Hi,

i have a question on RGW/multisite. The sync traffic is running a lot of 
requests per second (around 1500), which seems to be high, especially compared 
to the actual volume of user/client-requests.

We have a rather simple multisite-setup with
- two ceph clusters (16.2.6), 1 realm, 1 zonegroup, and one zone on each side, 
one of them ist the master zone.
- latency between those cluster around 0.3ms
- each cluster has 3 RGW/beast daemons running.
- a handful of buckets (around 20), and a check script which creates one bucket 
per second (and deletes it after validating the successful bucket creation).
- one of the buckets has a few million (smaller) objects, the others are (more 
or less) empty.
- from the client side, there are just a few requests per second (mostly PUT 
objects into the one larger bucket), writing a few kilobytes per second.
- roughly 5 GB in total disk size consumed currently, with the idea to increase 
the total consumption to a few TB over time.

Both clusters are in sync (after the initial full sync, they now do incremental 
sync). Although they do sync the new objects from cluster A (master, to which 
the clients connect to) to B, we see a lot of „internal“ sync requests in our 
monitoring: each rgw daemon does about 500 requests per second to a rgw daemon 
on cluster A, especially to "/admin/log?…", which leads to a total of 1500 
requests per second just for the sync, and this results in almost 60% cpu usage 
for the rgw/beast processes.

When stopping and restarting the rgw-instances on cluster-B, it first catches 
up with the delta, and as soon as it finishes, it starts to request in this 
endless loop "/admin/log…"

Is this amount of internal, sync-related requests normal and expected?

Thanks for any ideas how to debug / introspect this.

Best
Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: RGW/multisite sync traffic rps

Reply via email to