Re: [ceph-users] Ceph RBD Mirroring

Jason Dillaman Wed, 11 Sep 2019 10:15:23 -0700

On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth
<freyerm...@physik.uni-bonn.de> wrote:
>
> Dear Jason,
>
> I played a bit more with rbd mirroring and learned that deleting an image at 
> the source (or disabling journaling on it) immediately moves the image to 
> trash at the target -
> but setting rbd_mirroring_delete_delay helps to have some more grace time to 
> catch human mistakes.
>
> However, I have issues restoring such an image which has been moved to trash 
> by the RBD-mirror daemon as user:
> -----------------------------------
> [root@mon001 ~]# rbd trash ls -la
> ID           NAME                             SOURCE    DELETED_AT            
>    STATUS                                   PARENT
> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 11 18:43:14 
> 2019 protected until Thu Sep 12 18:43:14 2019
> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905
> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 
> librbd::api::Trash: restore: Current trash source: mirroring does not match 
> expected: user
> (22) Invalid argument
> -----------------------------------
> This is issued on the mon, which has the client.admin key, so it should not 
> be a permission issue.
> It also fails when I try that in the Dashboard.
>
> Sadly, the error message is not clear enough for me to figure out what could 
> be the problem - do you see what I did wrong?


Good catch, it looks like we accidentally broke this in Nautilus when
image live-migration support was added. I've opened a new tracker
ticket to fix this [1].

> Cheers and thanks again,
>         Oliver
>
> On 2019-09-10 23:17, Oliver Freyermuth wrote:
> > Dear Jason,
> >
> > On 2019-09-10 23:04, Jason Dillaman wrote:
> >> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth
> >> <freyerm...@physik.uni-bonn.de> wrote:
> >>>
> >>> Dear Jason,
> >>>
> >>> On 2019-09-10 18:50, Jason Dillaman wrote:
> >>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth
> >>>> <freyerm...@physik.uni-bonn.de> wrote:
> >>>>>
> >>>>> Dear Cephalopodians,
> >>>>>
> >>>>> I have two questions about RBD mirroring.
> >>>>>
> >>>>> 1) I can not get it to work - my setup is:
> >>>>>
> >>>>>      - One cluster holding the live RBD volumes and snapshots, in pool 
> >>>>> "rbd", cluster name "ceph",
> >>>>>        running latest Mimic.
> >>>>>        I ran "rbd mirror pool enable rbd pool" on that cluster and 
> >>>>> created a cephx user "rbd_mirror" with (is there a better way?):
> >>>>>        ceph auth get-or-create client.rbd_mirror mon 'allow r' osd 
> >>>>> 'allow class-read object_prefix rbd_children, allow pool rbd r' -o 
> >>>>> ceph.client.rbd_mirror.keyring --cluster ceph
> >>>>>        In that pool, two images have the journaling feature activated, 
> >>>>> all others have it disabled still (so I would expect these two to be 
> >>>>> mirrored).
> >>>>
> >>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps --
> >>>> but you definitely need more than read-only permissions to the remote
> >>>> cluster since it needs to be able to create snapshots of remote images
> >>>> and update/trim the image journals.
> >>>
> >>> these profiles really make life a lot easier. I should have thought of 
> >>> them rather than "guessing" a potentially good configuration...
> >>>
> >>>>
> >>>>>      - Another (empty) cluster running latest Nautilus, cluster name 
> >>>>> "ceph", pool "rbd".
> >>>>>        I've used the dashboard to activate mirroring for the RBD pool, 
> >>>>> and then added a peer with cluster name "ceph-virt", cephx-ID 
> >>>>> "rbd_mirror", filled in the mons and key created above.
> >>>>>        I've then run:
> >>>>>        ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' 
> >>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd rwx' 
> >>>>> -o client.rbd_mirror_backup.keyring --cluster ceph
> >>>>>        and deployed that key on the rbd-mirror machine, and started the 
> >>>>> service with:
> >>>>
> >>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps 
> >>>> [1].
> >>>
> >>> That did the trick (in combination with the above)!
> >>> Again a case of PEBKAC: I should have read the documentation until the 
> >>> end, clearly my fault.
> >>>
> >>> It works well now, even though it seems to run a bit slow (~35 MB/s for 
> >>> the initial sync when everything is 1 GBit/s),
> >>> but that may also be caused by combination of some very limited hardware 
> >>> on the receiving end (which will be scaled up in the future).
> >>> A single host with 6 disks, replica 3 and a RAID controller which can 
> >>> only do RAID0 and not JBOD is certainly not ideal, so commit latency may 
> >>> cause this slow bandwidth.
> >>
> >> You could try increasing "rbd_concurrent_management_ops" from the
> >> default of 10 ops to something higher to attempt to account for the
> >> latency. However, I wouldn't expect near-line speed w/ RBD mirroring.
> >
> > Thanks - I will play with this option once we have more storage available 
> > in the target pool ;-).
> >
> >>
> >>>>
> >>>>>        systemctl start ceph-rbd-mirror@rbd_mirror_backup.service
> >>>>>
> >>>>>     After this, everything looks fine:
> >>>>>      # rbd mirror pool info
> >>>>>        Mode: pool
> >>>>>        Peers:
> >>>>>         UUID                                 NAME      CLIENT
> >>>>>         XXXXXXXXXXX                          ceph-virt client.rbd_mirror
> >>>>>
> >>>>>     The service also seems to start fine, but logs show (debug 
> >>>>> rbd_mirror=20):
> >>>>>
> >>>>>     rbd::mirror::ClusterWatcher:0x5575e2a7d390 
> >>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, 
> >>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX
> >>>>>     rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter
> >>>>>     rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: 
> >>>>> restarting failed pool replayer for uuid: XXXXXXXXXXX cluster: 
> >>>>> ceph-virt client: client.rbd_mirror
> >>>>>     rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: 
> >>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror
> >>>>>     rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error 
> >>>>> connecting to remote peer uuid: XXXXXXXXXXX cluster: ceph-virt client: 
> >>>>> client.rbd_mirror: (95) Operation not supported
> >>>>>     rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: 
> >>>>> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to 
> >>>>> remote cluster
> >>>>
> >>>> If it's still broken after fixing your caps above, perhaps increase
> >>>> debugging for "rados", "monc", "auth", and "ms" to see if you can
> >>>> determine the source of the op not supported error.
> >>>>
> >>>>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from 
> >>>>> the cluster with the live images) on the rbd-mirror machine explicitly 
> >>>>> (i.e. not only in mon config storage),
> >>>>> and after doing that:
> >>>>>    rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls
> >>>>> works fine. So it's not a connectivity issue. Maybe a permission issue? 
> >>>>> Or did I miss something?
> >>>>>
> >>>>> Any idea what "operation not supported" means?
> >>>>> It's unclear to me whether things should work well using Mimic with 
> >>>>> Nautilus, and enabling pool mirroring but only having journaling on for 
> >>>>> two images is a supported case.
> >>>>
> >>>> Yes and yes.
> >>>>
> >>>>> 2) Since there is a performance drawback (about 2x) for journaling, is 
> >>>>> it also possible to only mirror snapshots, and leave the live volumes 
> >>>>> alone?
> >>>>>      This would cover the common backup usecase before deferred 
> >>>>> mirroring is implemented (or is it there already?).
> >>>>
> >>>> This is in-development right now and will hopefully land for the
> >>>> Octopus release.
> >>>
> >>> That would be very cool. Just to clarify: You mean the "real" deferred 
> >>> mirroring, not a "snapshot only" mirroring?
> >>> Is it already clear if this will require Octopous (or a later release) on 
> >>> both ends, or only on the receiving side?
> >>
> >> I might not be sure what you mean by deferred mirroring. You can delay
> >> the replay of the journal via the "rbd_mirroring_replay_delay"
> >> configuration option so that your DR site can be X seconds behind the
> >> primary at a minimum.
> >
> > This is indeed what I was thinking of...
> >
> >> For Octopus we are working on on-demand and
> >> scheduled snapshot mirroring between sites -- no journal is involved.
> >
> > ... and this is what I was dreaming of. We keep snapshots of VMs to be able 
> > to roll them back.
> > We'd like to also keep those snapshots in a separate Ceph instance as an 
> > additional safety-net (in addition to an offline backup of those snapshots 
> > with Benji backup).
> > It is not (yet) clear to me whether we can pay the "2 x" price for 
> > journaling in the long run, so this would be the way to go in case we can't.
> >
> >>
> >>> Since I got you personally, I have two bonus questions.
> >>>
> >>> 1) Your talk:
> >>>     
> >>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf
> >>>     mentions "rbd journal object flush age", which I'd translate with 
> >>> something like the "commit" mount option on a classical file system - 
> >>> correct?
> >>>     I don't find this switch documented anywhere, though - is there 
> >>> experience with it / what's the default?
> >>
> >> It's a low-level knob that by default causes the journal to flush its
> >> pending IO events before it allows the corresponding IO to be issued
> >> against the backing image. Setting it to a value greater that zero
> >> will allow that many seconds of IO events to be batched together in a
> >> journal append operation and its helpful for high-throughout, small IO
> >> operations. Of course it turned out that a bug had broken that option
> >> a while where events would never batch, so a fix is currently
> >> scheduled for backport of all active releases [1] w/ the goal that no
> >> one should need to tweak it.
> >
> > That's even better - since our setup is growing and we will keep upgrading, 
> > I'll then just keep things as they are now (no manual tweaking)
> > and tag along the development. Thanks!
> >
> >>
> >>> 2) I read I can run more than one rbd-mirror with Mimic/Nautilus. Do they 
> >>> load-balance the images, or "only" failover in case one of them dies?
> >>
> >> Starting with Nautilus, the default configuration for rbd-mirror is to
> >> evenly divide the number of mirrored images between all running
> >> daemons. This does not split the total load since some images might be
> >> hotter than others, but it at least spreads the load.
> >
> > That's fine enough for our use case. Spreading by "hotness" is a task 
> > without a clear answer
> > and "temperature" may change quickly, so that's all I hoped for.
> >
> > Many thanks again for the very helpful explanations!
> >
> > Cheers,
> >       Oliver
> >
> >>
> >>>
> >>> Cheers and many thanks for the quick and perfect help!
> >>>          Oliver
> >>>
> >>>>
> >>>>> Cheers and thanks in advance,
> >>>>>          Oliver
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@lists.ceph.com
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>> [1] 
> >>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon
> >>>>
> >>>> --
> >>>> Jason
> >>>>
> >>>
> >>>
> >>
> >> [1] https://github.com/ceph/ceph/pull/28539
> >>
> >
> >
>
>

[1] https://tracker.ceph.com/issues/41780

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph RBD Mirroring

Reply via email to