On Mon, Nov 25, 2019 at 12:24 PM Vikas Rana <vr...@vtiersys.com> wrote: > > Hi All, > I believe we forgot to take the snapshot in the previous test. Here's the > output from current test where we took snapshot on Primary side but the > snapshot did not replicated to DR side? > VTIER1 is the Primary box with cluster ceph. Vtier2a is the DR box with > cluster name cephdr. > > root@VTIER1:~# rbd ls -l nfs > NAME SIZE PARENT FMT PROT LOCK > dir_research 200TiB 2 excl > dir_research@dr_test 200TiB 2 > test01 100MiB 2 > root@VTIER1:~# > > > root@vtier2a:~# rbd ls -l nfs > NAME SIZE PARENT FMT PROT LOCK > dir_research 200TiB 2 excl > test01 100MiB 2 excl > > root@vtier2a:~# rbd mirror pool status nfs --verbose --cluster=cephdr > health: OK > images: 2 total > 2 replaying > > dir_research: > global_id: 92f46320-d43d-48eb-8a09-b68a1945cc77 > state: up+replaying > description: replaying, master_position=[object_number=597902, tag_tid=3, > entry_tid=705172054], mirror_position=[object_number=311129, tag_tid=3, > entry_tid=283416457], entries_behind_master=421755597 > last_update: 2019-11-25 12:14:52
The "entries_behind_master=421755597" is telling me that your "rbd-mirror" daemon is *very* far behind. Assuming each entry is a 4KiB IO, that would be over 1.5TiBs behind. > test01: > global_id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7 > state: up+replaying > description: replaying, master_position=[object_number=3, tag_tid=1, > entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], > entries_behind_master=0 > last_update: 2019-11-25 12:14:50 > > root@vtier2a:~# rbd-nbd --cluster=cephdr map nfs/dir_research@dr_test > 2019-11-25 12:17:45.764091 7f8bd73c5dc0 -1 asok(0x55fd9a7202a0) > AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to > bind the UNIX domain socket to '/var/run/ceph/cephdr-client.admin.asok': (17) > File exists > > > > Did we missed anything and why the snapshot didn't replicated to DR side? > > Thanks, > -Vikas > > -----Original Message----- > From: Jason Dillaman <jdill...@redhat.com> > Sent: Thursday, November 21, 2019 10:24 AM > To: Vikas Rana <vr...@vtiersys.com> > Cc: dillaman <dilla...@redhat.com>; ceph-users <ceph-users@lists.ceph.com> > Subject: Re: [ceph-users] RBD Mirror DR Testing > > On Thu, Nov 21, 2019 at 10:16 AM Vikas Rana <vr...@vtiersys.com> wrote: > > > > Thanks Jason. > > We are just mounting and verifying the directory structure and make sure it > > looks good. > > > > My understanding was, in 12.2.10, we can't mount the DR snapshot as the RBD > > image is non-primary. Is this wrong? > > You have always been able to access non-primary images for read-only > operations (only writes are prevented): > > $ rbd info test > rbd image 'test': > <... snip ...> > mirroring primary: false > > $ rbd device --device-type nbd map test@1 > /dev/nbd0 > $ mount /dev/nbd0 /mnt/ > mount: /mnt: WARNING: device write-protected, mounted read-only. > $ ll /mnt/ > total 0 > -rw-r--r--. 1 root root 0 Nov 21 10:20 hello.world > > > Thanks, > > -Vikas > > > > -----Original Message----- > > From: Jason Dillaman <jdill...@redhat.com> > > Sent: Thursday, November 21, 2019 9:58 AM > > To: Vikas Rana <vr...@vtiersys.com> > > Cc: ceph-users <ceph-users@lists.ceph.com> > > Subject: Re: [ceph-users] RBD Mirror DR Testing > > > > On Thu, Nov 21, 2019 at 9:56 AM Jason Dillaman <jdill...@redhat.com> wrote: > > > > > > On Thu, Nov 21, 2019 at 8:49 AM Vikas Rana <vr...@vtiersys.com> wrote: > > > > > > > > Thanks Jason for such a quick response. We are on 12.2.10. > > > > > > > > Checksuming a 200TB image will take a long time. > > > > > > How would mounting an RBD image and scanning the image be faster? > > > Are you only using a small percentage of the image? > > > > ... and of course, you can mount an RBD snapshot in read-only mode. > > > > > > To test the DR copy by mounting it, these are the steps I'm > > > > planning to follow 1. Demote the Prod copy and promote the DR copy > > > > 2. Do we have to recreate the rbd mirror relationship going from DR to > > > > primary? > > > > 3. Mount and validate the data > > > > 4. Demote the DR copy and promote the Prod copy 5. Revert the peer > > > > relationship if required? > > > > > > > > Did I do it right or miss anything? > > > > > > You cannot change the peers or you will lose the relationship. If > > > you insist on your course of action, you just need to be configured > > > for two-way mirroring and leave it that way. > > > > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > -----Original Message----- > > > > From: Jason Dillaman <jdill...@redhat.com> > > > > Sent: Thursday, November 21, 2019 8:33 AM > > > > To: Vikas Rana <vr...@vtiersys.com> > > > > Cc: ceph-users <ceph-users@lists.ceph.com> > > > > Subject: Re: [ceph-users] RBD Mirror DR Testing > > > > > > > > On Thu, Nov 21, 2019 at 8:29 AM Vikas Rana <vr...@vtiersys.com> wrote: > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > We have a 200TB RBD image which we are replicating using RBD > > > > > mirroring. > > > > > > > > > > We want to test the DR copy and make sure that we have a consistent > > > > > copy in case primary site is lost. > > > > > > > > > > > > > > > > > > > > We did it previously and promoted the DR copy which broken the DR > > > > > copy from primary and we have to resync the whole 200TB data. > > > > > > > > > > > > > > > > > > > > Is there any correct way of doing it so we don’t have to resync all > > > > > 200TB again? > > > > > > > > Yes, create a snapshot on the primary site and let it propagate to the > > > > non-primary site. Then you can compare checksums at the snapshot w/o > > > > having to worry about the data changing. Once you have finished, delete > > > > the snapshot on the primary site and it will propagate over to the > > > > non-primary site. > > > > > > > > > > > > > > > > > > > Can we demote current primary and then promote the DR copy and test > > > > > and then revert back? Will that require the complete 200TB sync? > > > > > > > > > > > > > It's only the forced-promotion that causes split-brain. If you > > > > gracefully demote from site A and promote site B, and then demote site > > > > B and promote site A, that will not require a sync. However, again, > > > > it's probably just easier to use a snapshot. > > > > > > > > > > > > > > Thanks in advance for your help and suggestions. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > -Vikas > > > > > > > > > > _______________________________________________ > > > > > ceph-users mailing list > > > > > ceph-users@lists.ceph.com > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > > -- > > > > Jason > > > > > > > > > > > > > > > > > -- > > > Jason > > > > > > > > -- > > Jason > > > > > > > -- > Jason > > -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com