I have a 2-site multisite configuration on cdnh 18.2.4 on EL9.
After system updates, we discovered that a particular bucket had several 
thousand objects missing, which the other side had. Newly created objects were 
being replicated just fine.

I decided to 'restart' syncing that bucket. Here is what I did
On the side with misisng objects:
> radosgw-admin bucket sync init --bucket <bucketname> --src-zone <zone>

I restarted the radosgw set up to do the sync thread on the same zone as I ran 
the radosgw-admin command. 

Logs on the radosgw src-zone side show GETs with http code 200 for objects that 
do not exist on the side with missing objects, and GETs with http 304 for 
objects that already exist on the side with missing objects.
So far, so good.
As I said, the bucket is active. So on the src-zone side, data is continually 
being written to /prefixA/../../ There is also data being written to 
/prefixB/../../
prefixA/ comes lexographically before prefixB/
What happens is that all the 304s happen as it scans the bucket, then starts 
pulling with GETs and http 200s for the objects the side doing the sync doesnt 
have. This is on /prefixA. When it 'caches up' with alldata in /prefixA at the 
moment, the sync seems to START OVER with /prefixA, giving 304s for everything 
that existed in the bucket up to the moment it caught up, then doing GETs with 
200s for the remainingnewer objects. This happens over and over again. It NEVER 
gets to /prefixB. So it seems to be periodically catching up to /prefixA, but 
never going on to /prefixB that is also being written to
There are 1.2 million objects in this bucket, with about 35 TiB in the bucket.
There is a lifecycle expiration happening of 60 days.
Any thoughts would be appreciated.
-Chris


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to