Re: [ceph-users] RBD Mirror DR Testing

Jason Dillaman Mon, 25 Nov 2019 09:36:05 -0800

On Mon, Nov 25, 2019 at 12:24 PM Vikas Rana <vr...@vtiersys.com> wrote:
>
> Hi All,
> I believe we forgot to take the snapshot in the previous test. Here's the 
> output from current test where we took snapshot on Primary side but the 
> snapshot did not replicated to DR side?
> VTIER1 is the Primary box with cluster ceph. Vtier2a is the DR box with 
> cluster name cephdr.
>
> root@VTIER1:~# rbd ls -l nfs
> NAME                   SIZE PARENT FMT PROT LOCK
> dir_research         200TiB          2      excl
> dir_research@dr_test 200TiB          2
> test01               100MiB          2
> root@VTIER1:~#
>
>
> root@vtier2a:~# rbd ls -l nfs
> NAME           SIZE PARENT FMT PROT LOCK
> dir_research 200TiB          2      excl
> test01       100MiB          2      excl
>
> root@vtier2a:~# rbd mirror pool status nfs --verbose --cluster=cephdr
> health: OK
> images: 2 total
>     2 replaying
>
> dir_research:
>   global_id:   92f46320-d43d-48eb-8a09-b68a1945cc77
>   state:       up+replaying
>   description: replaying, master_position=[object_number=597902, tag_tid=3, 
> entry_tid=705172054], mirror_position=[object_number=311129, tag_tid=3, 
> entry_tid=283416457], entries_behind_master=421755597
>   last_update: 2019-11-25 12:14:52


The "entries_behind_master=421755597" is telling me that your
"rbd-mirror" daemon is *very* far behind. Assuming each entry is a
4KiB IO, that would be over 1.5TiBs behind.


> test01:
>   global_id:   06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7
>   state:       up+replaying
>   description: replaying, master_position=[object_number=3, tag_tid=1, 
> entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], 
> entries_behind_master=0
>   last_update: 2019-11-25 12:14:50
>
> root@vtier2a:~# rbd-nbd --cluster=cephdr map nfs/dir_research@dr_test
> 2019-11-25 12:17:45.764091 7f8bd73c5dc0 -1 asok(0x55fd9a7202a0) 
> AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to 
> bind the UNIX domain socket to '/var/run/ceph/cephdr-client.admin.asok': (17) 
> File exists
>
>
>
> Did we missed anything and why the snapshot didn't replicated to DR side?
>
> Thanks,
> -Vikas
>
> -----Original Message-----
> From: Jason Dillaman <jdill...@redhat.com>
> Sent: Thursday, November 21, 2019 10:24 AM
> To: Vikas Rana <vr...@vtiersys.com>
> Cc: dillaman <dilla...@redhat.com>; ceph-users <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] RBD Mirror DR Testing
>
> On Thu, Nov 21, 2019 at 10:16 AM Vikas Rana <vr...@vtiersys.com> wrote:
> >
> > Thanks Jason.
> > We are just mounting and verifying the directory structure and make sure it 
> > looks good.
> >
> > My understanding was, in 12.2.10, we can't mount the DR snapshot as the RBD 
> > image is non-primary. Is this wrong?
>
> You have always been able to access non-primary images for read-only 
> operations (only writes are prevented):
>
> $ rbd info test
> rbd image 'test':
> <... snip ...>
>     mirroring primary: false
>
> $ rbd device --device-type nbd map test@1
> /dev/nbd0
> $ mount /dev/nbd0 /mnt/
> mount: /mnt: WARNING: device write-protected, mounted read-only.
> $ ll /mnt/
> total 0
> -rw-r--r--. 1 root root 0 Nov 21 10:20 hello.world
>
> > Thanks,
> > -Vikas
> >
> > -----Original Message-----
> > From: Jason Dillaman <jdill...@redhat.com>
> > Sent: Thursday, November 21, 2019 9:58 AM
> > To: Vikas Rana <vr...@vtiersys.com>
> > Cc: ceph-users <ceph-users@lists.ceph.com>
> > Subject: Re: [ceph-users] RBD Mirror DR Testing
> >
> > On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman <jdill...@redhat.com> wrote:
> > >
> > > On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana <vr...@vtiersys.com> wrote:
> > > >
> > > > Thanks Jason for such a quick response. We are on 12.2.10.
> > > >
> > > > Checksuming a 200TB image will take a long time.
> > >
> > > How would mounting an RBD image and scanning the image be faster?
> > > Are you only using a small percentage of the image?
> >
> > ... and of course, you can mount an RBD snapshot in read-only mode.
> >
> > > > To test the DR copy by mounting it, these are the steps I'm
> > > > planning to follow 1. Demote the Prod copy and promote the DR copy
> > > > 2. Do we have to recreate the rbd mirror relationship going from DR to 
> > > > primary?
> > > > 3. Mount and validate the data
> > > > 4. Demote the DR copy and promote the Prod copy 5. Revert the peer
> > > > relationship if required?
> > > >
> > > > Did I do it right or miss anything?
> > >
> > > You cannot change the peers or you will lose the relationship. If
> > > you insist on your course of action, you just need to be configured
> > > for two-way mirroring and leave it that way.
> > >
> > > >
> > > > Thanks,
> > > > -Vikas
> > > >
> > > > -----Original Message-----
> > > > From: Jason Dillaman <jdill...@redhat.com>
> > > > Sent: Thursday, November 21, 2019 8:33 AM
> > > > To: Vikas Rana <vr...@vtiersys.com>
> > > > Cc: ceph-users <ceph-users@lists.ceph.com>
> > > > Subject: Re: [ceph-users] RBD Mirror DR Testing
> > > >
> > > > On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana <vr...@vtiersys.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > >
> > > > > We have a 200TB RBD image which we are replicating using RBD 
> > > > > mirroring.
> > > > >
> > > > > We want to test the DR copy and make sure that we have a consistent 
> > > > > copy in case primary site is lost.
> > > > >
> > > > >
> > > > >
> > > > > We did it previously and promoted the DR copy which broken the DR 
> > > > > copy from primary and we have to resync the whole 200TB data.
> > > > >
> > > > >
> > > > >
> > > > > Is there any correct way of doing it so we don’t have to resync all 
> > > > > 200TB again?
> > > >
> > > > Yes, create a snapshot on the primary site and let it propagate to the 
> > > > non-primary site. Then you can compare checksums at the snapshot w/o 
> > > > having to worry about the data changing. Once you have finished, delete 
> > > > the snapshot on the primary site and it will propagate over to the 
> > > > non-primary site.
> > > >
> > > > >
> > > > >
> > > > > Can we demote current primary and then promote the DR copy and test 
> > > > > and then revert back? Will that require the complete 200TB sync?
> > > > >
> > > >
> > > > It's only the forced-promotion that causes split-brain. If you 
> > > > gracefully demote from site A and promote site B, and then demote site 
> > > > B and promote site A, that will not require a sync. However, again, 
> > > > it's probably just easier to use a snapshot.
> > > >
> > > > >
> > > > > Thanks in advance for your help and suggestions.
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > -Vikas
> > > > >
> > > > > _______________________________________________
> > > > > ceph-users mailing list
> > > > > ceph-users@lists.ceph.com
> > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > > >
> > > >
> > > > --
> > > > Jason
> > > >
> > > >
> > >
> > >
> > > --
> > > Jason
> >
> >
> >
> > --
> > Jason
> >
> >
>
>
> --
> Jason
>
>


--
Jason

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RBD Mirror DR Testing

Reply via email to