On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth <freyerm...@physik.uni-bonn.de> wrote: > > Dear Jason, > > I played a bit more with rbd mirroring and learned that deleting an image at > the source (or disabling journaling on it) immediately moves the image to > trash at the target - > but setting rbd_mirroring_delete_delay helps to have some more grace time to > catch human mistakes. > > However, I have issues restoring such an image which has been moved to trash > by the RBD-mirror daemon as user: > ----------------------------------- > [root@mon001 ~]# rbd trash ls -la > ID NAME SOURCE DELETED_AT > STATUS PARENT > d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep 11 18:43:14 > 2019 protected until Thu Sep 12 18:43:14 2019 > [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905 > rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 > librbd::api::Trash: restore: Current trash source: mirroring does not match > expected: user > (22) Invalid argument > ----------------------------------- > This is issued on the mon, which has the client.admin key, so it should not > be a permission issue. > It also fails when I try that in the Dashboard. > > Sadly, the error message is not clear enough for me to figure out what could > be the problem - do you see what I did wrong?
Good catch, it looks like we accidentally broke this in Nautilus when image live-migration support was added. I've opened a new tracker ticket to fix this [1]. > Cheers and thanks again, > Oliver > > On 2019-09-10 23:17, Oliver Freyermuth wrote: > > Dear Jason, > > > > On 2019-09-10 23:04, Jason Dillaman wrote: > >> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth > >> <freyerm...@physik.uni-bonn.de> wrote: > >>> > >>> Dear Jason, > >>> > >>> On 2019-09-10 18:50, Jason Dillaman wrote: > >>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth > >>>> <freyerm...@physik.uni-bonn.de> wrote: > >>>>> > >>>>> Dear Cephalopodians, > >>>>> > >>>>> I have two questions about RBD mirroring. > >>>>> > >>>>> 1) I can not get it to work - my setup is: > >>>>> > >>>>> - One cluster holding the live RBD volumes and snapshots, in pool > >>>>> "rbd", cluster name "ceph", > >>>>> running latest Mimic. > >>>>> I ran "rbd mirror pool enable rbd pool" on that cluster and > >>>>> created a cephx user "rbd_mirror" with (is there a better way?): > >>>>> ceph auth get-or-create client.rbd_mirror mon 'allow r' osd > >>>>> 'allow class-read object_prefix rbd_children, allow pool rbd r' -o > >>>>> ceph.client.rbd_mirror.keyring --cluster ceph > >>>>> In that pool, two images have the journaling feature activated, > >>>>> all others have it disabled still (so I would expect these two to be > >>>>> mirrored). > >>>> > >>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for the caps -- > >>>> but you definitely need more than read-only permissions to the remote > >>>> cluster since it needs to be able to create snapshots of remote images > >>>> and update/trim the image journals. > >>> > >>> these profiles really make life a lot easier. I should have thought of > >>> them rather than "guessing" a potentially good configuration... > >>> > >>>> > >>>>> - Another (empty) cluster running latest Nautilus, cluster name > >>>>> "ceph", pool "rbd". > >>>>> I've used the dashboard to activate mirroring for the RBD pool, > >>>>> and then added a peer with cluster name "ceph-virt", cephx-ID > >>>>> "rbd_mirror", filled in the mons and key created above. > >>>>> I've then run: > >>>>> ceph auth get-or-create client.rbd_mirror_backup mon 'allow r' > >>>>> osd 'allow class-read object_prefix rbd_children, allow pool rbd rwx' > >>>>> -o client.rbd_mirror_backup.keyring --cluster ceph > >>>>> and deployed that key on the rbd-mirror machine, and started the > >>>>> service with: > >>>> > >>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for your caps > >>>> [1]. > >>> > >>> That did the trick (in combination with the above)! > >>> Again a case of PEBKAC: I should have read the documentation until the > >>> end, clearly my fault. > >>> > >>> It works well now, even though it seems to run a bit slow (~35 MB/s for > >>> the initial sync when everything is 1 GBit/s), > >>> but that may also be caused by combination of some very limited hardware > >>> on the receiving end (which will be scaled up in the future). > >>> A single host with 6 disks, replica 3 and a RAID controller which can > >>> only do RAID0 and not JBOD is certainly not ideal, so commit latency may > >>> cause this slow bandwidth. > >> > >> You could try increasing "rbd_concurrent_management_ops" from the > >> default of 10 ops to something higher to attempt to account for the > >> latency. However, I wouldn't expect near-line speed w/ RBD mirroring. > > > > Thanks - I will play with this option once we have more storage available > > in the target pool ;-). > > > >> > >>>> > >>>>> systemctl start ceph-rbd-mirror@rbd_mirror_backup.service > >>>>> > >>>>> After this, everything looks fine: > >>>>> # rbd mirror pool info > >>>>> Mode: pool > >>>>> Peers: > >>>>> UUID NAME CLIENT > >>>>> XXXXXXXXXXX ceph-virt client.rbd_mirror > >>>>> > >>>>> The service also seems to start fine, but logs show (debug > >>>>> rbd_mirror=20): > >>>>> > >>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390 > >>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, > >>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX > >>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: enter > >>>>> rbd::mirror::Mirror: 0x5575e29c7240 update_pool_replayers: > >>>>> restarting failed pool replayer for uuid: XXXXXXXXXXX cluster: > >>>>> ceph-virt client: client.rbd_mirror > >>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: replaying for uuid: > >>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror > >>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init_rados: error > >>>>> connecting to remote peer uuid: XXXXXXXXXXX cluster: ceph-virt client: > >>>>> client.rbd_mirror: (95) Operation not supported > >>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70 add_or_update_callout: > >>>>> pool_id=2, callout_id=2, callout_level=error, text=unable to connect to > >>>>> remote cluster > >>>> > >>>> If it's still broken after fixing your caps above, perhaps increase > >>>> debugging for "rados", "monc", "auth", and "ms" to see if you can > >>>> determine the source of the op not supported error. > >>>> > >>>>> I already tried storing the ceph.client.rbd_mirror.keyring (i.e. from > >>>>> the cluster with the live images) on the rbd-mirror machine explicitly > >>>>> (i.e. not only in mon config storage), > >>>>> and after doing that: > >>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls > >>>>> works fine. So it's not a connectivity issue. Maybe a permission issue? > >>>>> Or did I miss something? > >>>>> > >>>>> Any idea what "operation not supported" means? > >>>>> It's unclear to me whether things should work well using Mimic with > >>>>> Nautilus, and enabling pool mirroring but only having journaling on for > >>>>> two images is a supported case. > >>>> > >>>> Yes and yes. > >>>> > >>>>> 2) Since there is a performance drawback (about 2x) for journaling, is > >>>>> it also possible to only mirror snapshots, and leave the live volumes > >>>>> alone? > >>>>> This would cover the common backup usecase before deferred > >>>>> mirroring is implemented (or is it there already?). > >>>> > >>>> This is in-development right now and will hopefully land for the > >>>> Octopus release. > >>> > >>> That would be very cool. Just to clarify: You mean the "real" deferred > >>> mirroring, not a "snapshot only" mirroring? > >>> Is it already clear if this will require Octopous (or a later release) on > >>> both ends, or only on the receiving side? > >> > >> I might not be sure what you mean by deferred mirroring. You can delay > >> the replay of the journal via the "rbd_mirroring_replay_delay" > >> configuration option so that your DR site can be X seconds behind the > >> primary at a minimum. > > > > This is indeed what I was thinking of... > > > >> For Octopus we are working on on-demand and > >> scheduled snapshot mirroring between sites -- no journal is involved. > > > > ... and this is what I was dreaming of. We keep snapshots of VMs to be able > > to roll them back. > > We'd like to also keep those snapshots in a separate Ceph instance as an > > additional safety-net (in addition to an offline backup of those snapshots > > with Benji backup). > > It is not (yet) clear to me whether we can pay the "2 x" price for > > journaling in the long run, so this would be the way to go in case we can't. > > > >> > >>> Since I got you personally, I have two bonus questions. > >>> > >>> 1) Your talk: > >>> > >>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf > >>> mentions "rbd journal object flush age", which I'd translate with > >>> something like the "commit" mount option on a classical file system - > >>> correct? > >>> I don't find this switch documented anywhere, though - is there > >>> experience with it / what's the default? > >> > >> It's a low-level knob that by default causes the journal to flush its > >> pending IO events before it allows the corresponding IO to be issued > >> against the backing image. Setting it to a value greater that zero > >> will allow that many seconds of IO events to be batched together in a > >> journal append operation and its helpful for high-throughout, small IO > >> operations. Of course it turned out that a bug had broken that option > >> a while where events would never batch, so a fix is currently > >> scheduled for backport of all active releases [1] w/ the goal that no > >> one should need to tweak it. > > > > That's even better - since our setup is growing and we will keep upgrading, > > I'll then just keep things as they are now (no manual tweaking) > > and tag along the development. Thanks! > > > >> > >>> 2) I read I can run more than one rbd-mirror with Mimic/Nautilus. Do they > >>> load-balance the images, or "only" failover in case one of them dies? > >> > >> Starting with Nautilus, the default configuration for rbd-mirror is to > >> evenly divide the number of mirrored images between all running > >> daemons. This does not split the total load since some images might be > >> hotter than others, but it at least spreads the load. > > > > That's fine enough for our use case. Spreading by "hotness" is a task > > without a clear answer > > and "temperature" may change quickly, so that's all I hoped for. > > > > Many thanks again for the very helpful explanations! > > > > Cheers, > > Oliver > > > >> > >>> > >>> Cheers and many thanks for the quick and perfect help! > >>> Oliver > >>> > >>>> > >>>>> Cheers and thanks in advance, > >>>>> Oliver > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@lists.ceph.com > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >>>> [1] > >>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon > >>>> > >>>> -- > >>>> Jason > >>>> > >>> > >>> > >> > >> [1] https://github.com/ceph/ceph/pull/28539 > >> > > > > > > [1] https://tracker.ceph.com/issues/41780 -- Jason _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com