[ceph-users] Re: multisite sync issue with bucket sync

Christopher Durham Wed, 20 Nov 2024 15:03:29 -0800

 Casey,
OR, is there a way to continue on with new data syncing (incremental) as the 
full sync catches up, as the full sync will take a long time, and no new 
incremental data is being replicated.
-Chris


    On Wednesday, November 20, 2024 at 03:30:40 PM MST, Christopher Durham 
<caduceu...@aol.com> wrote:   

  Casey,
Thanks for your response. So is there a way to abandon a full sync and just 
move on with an incremental from the time you abandon the full sync?
-Chris

    On Wednesday, November 20, 2024 at 12:29:26 PM MST, Casey Bodley 
<cbod...@redhat.com> wrote:   

 On Wed, Nov 20, 2024 at 2:10 PM Christopher Durham <caduceu...@aol.com> wrote:
>
>  Ok,
> Source code review reveals that full sync is marker based and sync errors 
> within a marker group *suggest* that data within the marker isre-checked, (I 
> may be wrong about this, but that is consistent with my 304 errors below). I 
> do however, have the folllowing question:
> Is there a way to otherwise abort a full sync of a bucket (as a result of 
> radosgw-admin bucket sync init --bucket <bucket> and bucket sync run (or 
> restart of radosgw),and have it just do incremental sync from then on (yes, 
> having the objects not be the same on both sides prior to the 'restart' of an 
> incremental sync.
> Would radosgw-admin bucket sync disable --bucket <bucket> followed by 
> radosgw-admin bucket sync enable --bucket <bucket> do this? Or would that do 
> anotherfull sync and not an incremental?

'bucket sync enable' does start a new full sync (to catch objects that
were uploaded since 'bucket sync disable')

Thanks
> -Chris
>
>    On Thursday, November 14, 2024 at 04:18:34 PM MST, Christopher Durham 
><caduceu...@aol.com> wrote:
>
>  Hi,
> I have heard nothing on this, but have done some more research.
> Again, both sides of a multisite s3 configuration are ceph 18.2.4 on Rocky 9.
> For a given bucket, there are thousands of 'missing' objects. I did:
> radosgw-admin bucket sync init --bucket <bucket> --src-zone <other side 
> zone>sync starts after I restart a radosgw on the source zone that has a sync 
> thread.
> But based on number and size of objects needing replication, it NEVER 
> finishes, as more objects are created as I am going.I may need to increase 
> the number of radosgw and or the sync threads.
>
> What I have discovered that if a radosgw on the side with missing objects is 
> restarted, all sycing starts over!In other words, it starts polling each 
> object, getting a 304 error in the radosgw log on the server on the multisite 
> that has the missing objects.It *appears* to do this sequential object scan 
> in lexographic order of object and/or prefix name, although I cannot be sure.
>
> So some questions:
> 1. Is there a recommendation/rule of thumb/formula for the number of 
> radosgws/syncthreads/ etc based on number of objects, buckets, bandwidth, 
> etc?2. Why does the syncing restart for a bucket when a radosgw is restarted? 
> Is there a way to tell it to restart where it left off as opposed to starting 
> over?There may be reasons to restart a bucket sync if a radosgw restarts, but 
> there should be a way to checkpoint/force it to not restart/start where left 
> off, etc.3. Is there a way to 'abort' the sync and cause the bucket to think 
> it is up to date and only replicate new objects from the time it was marked 
> up to date?
> Thanks for any information
> -Chris
>
>
>
>    On Friday, November 8, 2024 at 03:45:05 PM MST, Christopher Durham 
><caduceu...@aol.com> wrote:
>
>
> I have a 2-site multisite configuration on cdnh 18.2.4 on EL9.
> After system updates, we discovered that a particular bucket had several 
> thousand objects missing, which the other side had. Newly created objects 
> were being replicated just fine.
>
> I decided to 'restart' syncing that bucket. Here is what I did
> On the side with misisng objects:
> > radosgw-admin bucket sync init --bucket <bucketname> --src-zone <zone>
>
> I restarted the radosgw set up to do the sync thread on the same zone as I 
> ran the radosgw-admin command.
>
> Logs on the radosgw src-zone side show GETs with http code 200 for objects 
> that do not exist on the side with missing objects, and GETs with http 304 
> for objects that already exist on the side with missing objects.
> So far, so good.
> As I said, the bucket is active. So on the src-zone side, data is continually 
> being written to /prefixA/../../ There is also data being written to 
> /prefixB/../../
> prefixA/ comes lexographically before prefixB/
> What happens is that all the 304s happen as it scans the bucket, then starts 
> pulling with GETs and http 200s for the objects the side doing the sync 
> doesnt have. This is on /prefixA. When it 'caches up' with alldata in 
> /prefixA at the moment, the sync seems to START OVER with /prefixA, giving 
> 304s for everything that existed in the bucket up to the moment it caught up, 
> then doing GETs with 200s for the remainingnewer objects. This happens over 
> and over again. It NEVER gets to /prefixB. So it seems to be periodically 
> catching up to /prefixA, but never going on to /prefixB that is also being 
> written to
> There are 1.2 million objects in this bucket, with about 35 TiB in the bucket.
> There is a lifecycle expiration happening of 60 days.
> Any thoughts would be appreciated.
> -Chris
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
    
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: multisite sync issue with bucket sync

Reply via email to