On Fri, Mar 22, 2019 at 8:38 AM Vikas Rana <vr...@vtiersys.com> wrote: > > Hi Jason, > > Thanks you for your help and support. > > > One last question, after the demotion and promotion and when you do a resync > again, does it copies the whole image again or sends just the changes since > the last journal update?
Right now, it will copy the entire image. There is still a long(er) term plan to get support from the OSDs to deeply delete a backing object which would be needed in the case where a snapshot exists on the image and you need to resync the non-HEAD revision. Once that support is in-place, we can tweak the resync logic to only copy the deltas by comparing hashes of the objects. > I'm trying to estimate how long will it take to get a 200TB image in sync. > > Thanks, > -Vikas > > > -----Original Message----- > From: Jason Dillaman <jdill...@redhat.com> > Sent: Wednesday, March 13, 2019 4:49 PM > To: Vikas Rana <vr...@vtiersys.com> > Subject: Re: [ceph-users] RBD Mirror Image Resync > > On Wed, Mar 13, 2019 at 4:42 PM Vikas Rana <vr...@vtiersys.com> wrote: > > > > Thanks Jason for your response. > > > > From the documents, I believe the resync has to be run where rbd-mirror > > daemon is running. > > Rbd-mirror is running on the DR site and that’s where we issued the resync. > > You would need rbd-mirror daemon configured and running against both > clusters. The "resync" request just adds a flag to the specified image which > the local "rbd-mirror" daemon discovers and then starts to pull the image > down from the remote cluster. So again, the correct procedure is to initiate > the resync against the out-of-sync image you want to delete/recreate, wait > for it to complete, then demote the current primary image, and promote the > newly resynced image to primary. > > > Should we do it on Prod site? > > Here's the Prod status > > :~# rbd info nfs/dir_research > > rbd image 'dir_research': > > size 200 TB in 52428800 objects > > order 22 (4096 kB objects) > > block_name_prefix: rbd_data.edd65238e1f29 > > format: 2 > > features: layering, exclusive-lock, journaling > > flags: > > journal: edd65238e1f29 > > mirroring state: enabled > > mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4 > > mirroring primary: true > > Are you sure this is the prod site? The image id is different from the dump > below. > > > > > > > What does "starting_replay" means? > > Given that the state is "down+unknown", I think it's just an odd, left-over > status message. The "down" indicates that you do not have a > running/functional "rbd-mirror" daemon running against cluster "cephdr". If > it is running, I would check its log messages to see if any errors are being > spit out. > > > Thanks, > > -Vikas > > > > -----Original Message----- > > From: Jason Dillaman <jdill...@redhat.com> > > Sent: Wednesday, March 13, 2019 3:44 PM > > To: Vikas Rana <vr...@vtiersys.com> > > Cc: ceph-users <ceph-users@lists.ceph.com> > > Subject: Re: [ceph-users] RBD Mirror Image Resync > > > > On Tue, Mar 12, 2019 at 11:09 PM Vikas Rana <vr...@vtiersys.com> wrote: > > > > > > Hi there, > > > > > > > > > > > > We are replicating a RBD image from Primary to DR site using RBD > > > mirroring. > > > > > > On Primary, we were using 10.2.10. > > > > Just a note that Jewel is end-of-life upstream. > > > > > DR site is luminous and we promoted the DR copy to test the failure. > > > Everything checked out good. > > > > > > > > > > > > Now we are trying to restart the replication and we did the demote > > > and then resync the image but it stuck in “starting_replay” state > > > for last > > > 3 days. It’s a 200TB RBD image > > > > You would need to run "rbd --cluster <primary-site> mirror image resync > > nfs/dir_research" and wait for that to complete *before* demoting the > > primary image on cluster "cephdr". Without a primary image, there is > > nothing to resync against. > > > > > > > > > > > :~# rbd --cluster cephdr mirror pool status nfs --verbose > > > > > > health: WARNING > > > > > > images: 1 total > > > > > > 1 starting_replay > > > > > > > > > > > > dir_research: > > > > > > global_id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4 > > > > > > state: down+unknown > > > > > > description: status not found > > > > > > last_update: > > > > > > > > > > > > > > > > > > #rbd info nfs/dir_research > > > > > > rbd image 'dir_research': > > > > > > size 200TiB in 52428800 objects > > > > > > order 22 (4MiB objects) > > > > > > block_name_prefix: rbd_data.652186b8b4567 > > > > > > format: 2 > > > > > > features: layering, exclusive-lock, journaling > > > > > > flags: > > > > > > create_timestamp: Thu Feb 7 11:53:36 2019 > > > > > > journal: 652186b8b4567 > > > > > > mirroring state: disabling > > > > > > mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4 > > > > > > mirroring primary: false > > > > > > > > > So the question is, how do we know the progress of the replay and how > > > much its already completed and any ETA estimation on when it will go back > > > to OK state? > > > > > > > > > > > > > > > Thanks, > > > > > > -Vikas > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > -- > > Jason > > > > > -- > Jason > -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com