Casey, OR, is there a way to continue on with new data syncing (incremental) as the full sync catches up, as the full sync will take a long time, and no new incremental data is being replicated. -Chris
On Wednesday, November 20, 2024 at 03:30:40 PM MST, Christopher Durham <caduceu...@aol.com> wrote: Casey, Thanks for your response. So is there a way to abandon a full sync and just move on with an incremental from the time you abandon the full sync? -Chris On Wednesday, November 20, 2024 at 12:29:26 PM MST, Casey Bodley <cbod...@redhat.com> wrote: On Wed, Nov 20, 2024 at 2:10 PM Christopher Durham <caduceu...@aol.com> wrote: > > Ok, > Source code review reveals that full sync is marker based and sync errors > within a marker group *suggest* that data within the marker isre-checked, (I > may be wrong about this, but that is consistent with my 304 errors below). I > do however, have the folllowing question: > Is there a way to otherwise abort a full sync of a bucket (as a result of > radosgw-admin bucket sync init --bucket <bucket> and bucket sync run (or > restart of radosgw),and have it just do incremental sync from then on (yes, > having the objects not be the same on both sides prior to the 'restart' of an > incremental sync. > Would radosgw-admin bucket sync disable --bucket <bucket> followed by > radosgw-admin bucket sync enable --bucket <bucket> do this? Or would that do > anotherfull sync and not an incremental? 'bucket sync enable' does start a new full sync (to catch objects that were uploaded since 'bucket sync disable') Thanks > -Chris > > On Thursday, November 14, 2024 at 04:18:34 PM MST, Christopher Durham ><caduceu...@aol.com> wrote: > > Hi, > I have heard nothing on this, but have done some more research. > Again, both sides of a multisite s3 configuration are ceph 18.2.4 on Rocky 9. > For a given bucket, there are thousands of 'missing' objects. I did: > radosgw-admin bucket sync init --bucket <bucket> --src-zone <other side > zone>sync starts after I restart a radosgw on the source zone that has a sync > thread. > But based on number and size of objects needing replication, it NEVER > finishes, as more objects are created as I am going.I may need to increase > the number of radosgw and or the sync threads. > > What I have discovered that if a radosgw on the side with missing objects is > restarted, all sycing starts over!In other words, it starts polling each > object, getting a 304 error in the radosgw log on the server on the multisite > that has the missing objects.It *appears* to do this sequential object scan > in lexographic order of object and/or prefix name, although I cannot be sure. > > So some questions: > 1. Is there a recommendation/rule of thumb/formula for the number of > radosgws/syncthreads/ etc based on number of objects, buckets, bandwidth, > etc?2. Why does the syncing restart for a bucket when a radosgw is restarted? > Is there a way to tell it to restart where it left off as opposed to starting > over?There may be reasons to restart a bucket sync if a radosgw restarts, but > there should be a way to checkpoint/force it to not restart/start where left > off, etc.3. Is there a way to 'abort' the sync and cause the bucket to think > it is up to date and only replicate new objects from the time it was marked > up to date? > Thanks for any information > -Chris > > > > On Friday, November 8, 2024 at 03:45:05 PM MST, Christopher Durham ><caduceu...@aol.com> wrote: > > > I have a 2-site multisite configuration on cdnh 18.2.4 on EL9. > After system updates, we discovered that a particular bucket had several > thousand objects missing, which the other side had. Newly created objects > were being replicated just fine. > > I decided to 'restart' syncing that bucket. Here is what I did > On the side with misisng objects: > > radosgw-admin bucket sync init --bucket <bucketname> --src-zone <zone> > > I restarted the radosgw set up to do the sync thread on the same zone as I > ran the radosgw-admin command. > > Logs on the radosgw src-zone side show GETs with http code 200 for objects > that do not exist on the side with missing objects, and GETs with http 304 > for objects that already exist on the side with missing objects. > So far, so good. > As I said, the bucket is active. So on the src-zone side, data is continually > being written to /prefixA/../../ There is also data being written to > /prefixB/../../ > prefixA/ comes lexographically before prefixB/ > What happens is that all the 304s happen as it scans the bucket, then starts > pulling with GETs and http 200s for the objects the side doing the sync > doesnt have. This is on /prefixA. When it 'caches up' with alldata in > /prefixA at the moment, the sync seems to START OVER with /prefixA, giving > 304s for everything that existed in the bucket up to the moment it caught up, > then doing GETs with 200s for the remainingnewer objects. This happens over > and over again. It NEVER gets to /prefixB. So it seems to be periodically > catching up to /prefixA, but never going on to /prefixB that is also being > written to > There are 1.2 million objects in this bucket, with about 35 TiB in the bucket. > There is a lifecycle expiration happening of 60 days. > Any thoughts would be appreciated. > -Chris > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io