Re: [ceph-users] RBD Mirror Image Resync

Jason Dillaman Tue, 26 Mar 2019 05:34:19 -0700

On Fri, Mar 22, 2019 at 8:38 AM Vikas Rana <vr...@vtiersys.com> wrote:
>
> Hi Jason,
>
> Thanks you for your help and support.
>
>
> One last question, after the demotion and promotion and when you do a resync 
> again, does it copies the whole image again or sends just the changes since 
> the last journal update?


Right now, it will copy the entire image. There is still a long(er)
term plan to get support from the OSDs to deeply delete a backing
object which would be needed in the case where a snapshot exists on
the image and you need to resync the non-HEAD revision. Once that
support is in-place, we can tweak the resync logic to only copy the
deltas by comparing hashes of the objects.

> I'm trying to estimate how long will it take to get a 200TB image in sync.
>
> Thanks,
> -Vikas
>
>
> -----Original Message-----
> From: Jason Dillaman <jdill...@redhat.com>
> Sent: Wednesday, March 13, 2019 4:49 PM
> To: Vikas Rana <vr...@vtiersys.com>
> Subject: Re: [ceph-users] RBD Mirror Image Resync
>
> On Wed, Mar 13, 2019 at 4:42 PM Vikas Rana <vr...@vtiersys.com> wrote:
> >
> > Thanks Jason for your response.
> >
> > From the documents, I believe the resync has to be run where rbd-mirror 
> > daemon is running.
> > Rbd-mirror is running on the DR site and that’s where we issued the resync.
>
> You would need rbd-mirror daemon configured and running against both 
> clusters. The "resync" request just adds a flag to the specified image which 
> the local "rbd-mirror" daemon discovers and then starts to pull the image 
> down from the remote cluster. So again, the correct procedure is to initiate 
> the resync against the out-of-sync image you want to delete/recreate, wait 
> for it to complete, then demote the current primary image, and promote the 
> newly resynced image to primary.
>
> > Should we do it on Prod site?
> > Here's the Prod status
> > :~# rbd info nfs/dir_research
> > rbd image 'dir_research':
> >         size 200 TB in 52428800 objects
> >         order 22 (4096 kB objects)
> >         block_name_prefix: rbd_data.edd65238e1f29
> >         format: 2
> >         features: layering, exclusive-lock, journaling
> >         flags:
> >         journal: edd65238e1f29
> >         mirroring state: enabled
> >         mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4
> >         mirroring primary: true
>
> Are you sure this is the prod site? The image id is different from the dump 
> below.
>
> >
> >
> > What does "starting_replay" means?
>
> Given that the state is "down+unknown", I think it's just an odd, left-over 
> status message. The "down" indicates that you do not have a 
> running/functional "rbd-mirror" daemon running against cluster "cephdr". If 
> it is running, I would check its log messages to see if any errors are being 
> spit out.
>
> > Thanks,
> > -Vikas
> >
> > -----Original Message-----
> > From: Jason Dillaman <jdill...@redhat.com>
> > Sent: Wednesday, March 13, 2019 3:44 PM
> > To: Vikas Rana <vr...@vtiersys.com>
> > Cc: ceph-users <ceph-users@lists.ceph.com>
> > Subject: Re: [ceph-users] RBD Mirror Image Resync
> >
> > On Tue, Mar 12, 2019 at 11:09 PM Vikas Rana <vr...@vtiersys.com> wrote:
> > >
> > > Hi there,
> > >
> > >
> > >
> > > We are replicating a RBD image from Primary to DR site using RBD 
> > > mirroring.
> > >
> > > On Primary, we were using 10.2.10.
> >
> > Just a note that Jewel is end-of-life upstream.
> >
> > > DR site is luminous and we promoted the DR copy to test the failure. 
> > > Everything checked out good.
> > >
> > >
> > >
> > > Now we are trying to restart the replication and we did the demote
> > > and then resync the image but it stuck in “starting_replay” state
> > > for last
> > > 3 days. It’s a 200TB RBD image
> >
> > You would need to run "rbd --cluster <primary-site> mirror image resync 
> > nfs/dir_research" and wait for that to complete *before* demoting the 
> > primary image on cluster "cephdr". Without a primary image, there is 
> > nothing to resync against.
> >
> > >
> > >
> > > :~# rbd --cluster cephdr mirror pool status nfs --verbose
> > >
> > > health: WARNING
> > >
> > > images: 1 total
> > >
> > >     1 starting_replay
> > >
> > >
> > >
> > > dir_research:
> > >
> > >   global_id:   3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > >
> > >   state:       down+unknown
> > >
> > >   description: status not found
> > >
> > >   last_update:
> > >
> > >
> > >
> > >
> > >
> > > #rbd info nfs/dir_research
> > >
> > > rbd image 'dir_research':
> > >
> > >         size 200TiB in 52428800 objects
> > >
> > >         order 22 (4MiB objects)
> > >
> > >         block_name_prefix: rbd_data.652186b8b4567
> > >
> > >         format: 2
> > >
> > >         features: layering, exclusive-lock, journaling
> > >
> > >         flags:
> > >
> > >         create_timestamp: Thu Feb  7 11:53:36 2019
> > >
> > >         journal: 652186b8b4567
> > >
> > >         mirroring state: disabling
> > >
> > >         mirroring global id: 3ad67d0c-e06b-406a-9469-4e5faedd09a4
> > >
> > >         mirroring primary: false
> > >
> > >
> > > So the question is, how do we know the progress of the replay and how 
> > > much its already completed and any ETA estimation on when it will go back 
> > > to OK state?
> > >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > -Vikas
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Jason
> >
>
>
> --
> Jason
>


-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD Mirror Image Resync

Reply via email to