Dear Jason, Am 15.09.19 um 00:03 schrieb Jason Dillaman: > I was able to repeat this issue locally by restarting the primary OSD > for the "rbd_mirroring" object. It seems that a regression was > introduced w/ the introduction of Ceph msgr2 in that upon reconnect, > the connection type for the client switches from ANY to V2 -- but only > for the watcher session and not the status updates. I've opened a > tracker ticker for this issue [1]. > > Thanks.
many thanks to you for the detailed investigation and reproduction! While I did not restart the first 5 OSDs of the test cluster, I added an OSD and rebalanced - so I guess this can also be triggered if the primary OSD for the object changes, which should of course also lead to a reconnection. I can also add to my observations that now while not touching the cluster anymore things stay in "up+replaying". Thanks and all the best, Oliver > > On Fri, Sep 13, 2019 at 12:44 PM Oliver Freyermuth > <freyerm...@physik.uni-bonn.de> wrote: >> >> Am 13.09.19 um 18:38 schrieb Jason Dillaman: >>> On Fri, Sep 13, 2019 at 11:30 AM Oliver Freyermuth >>> <freyerm...@physik.uni-bonn.de> wrote: >>>> >>>> Am 13.09.19 um 17:18 schrieb Jason Dillaman: >>>>> On Fri, Sep 13, 2019 at 10:41 AM Oliver Freyermuth >>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>> >>>>>> Am 13.09.19 um 16:30 schrieb Jason Dillaman: >>>>>>> On Fri, Sep 13, 2019 at 10:17 AM Jason Dillaman <jdill...@redhat.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Fri, Sep 13, 2019 at 10:02 AM Oliver Freyermuth >>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>>>>> >>>>>>>>> Dear Jason, >>>>>>>>> >>>>>>>>> thanks for the very detailed explanation! This was very instructive. >>>>>>>>> Sadly, the watchers look correct - see details inline. >>>>>>>>> >>>>>>>>> Am 13.09.19 um 15:02 schrieb Jason Dillaman: >>>>>>>>>> On Thu, Sep 12, 2019 at 9:55 PM Oliver Freyermuth >>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>>>>>>> >>>>>>>>>>> Dear Jason, >>>>>>>>>>> >>>>>>>>>>> thanks for taking care and developing a patch so quickly! >>>>>>>>>>> >>>>>>>>>>> I have another strange observation to share. In our test setup, >>>>>>>>>>> only a single RBD mirroring daemon is running for 51 images. >>>>>>>>>>> It works fine with a constant stream of 1-2 MB/s, but at some point >>>>>>>>>>> after roughly 20 hours, _all_ images go to this interesting state: >>>>>>>>>>> ----------------------------------------- >>>>>>>>>>> # rbd mirror image status test-vm.XXXXX-disk2 >>>>>>>>>>> test-vm.XXXXX-disk2: >>>>>>>>>>> global_id: XXXXXXXXXXXXXXX >>>>>>>>>>> state: down+replaying >>>>>>>>>>> description: replaying, master_position=[object_number=14, >>>>>>>>>>> tag_tid=6, entry_tid=6338], mirror_position=[object_number=14, >>>>>>>>>>> tag_tid=6, entry_tid=6338], entries_behind_master=0 >>>>>>>>>>> last_update: 2019-09-13 03:45:43 >>>>>>>>>>> ----------------------------------------- >>>>>>>>>>> Running this command several times, I see entry_tid increasing at >>>>>>>>>>> both ends, so mirroring seems to be working just fine. >>>>>>>>>>> >>>>>>>>>>> However: >>>>>>>>>>> ----------------------------------------- >>>>>>>>>>> # rbd mirror pool status >>>>>>>>>>> health: WARNING >>>>>>>>>>> images: 51 total >>>>>>>>>>> 51 unknown >>>>>>>>>>> ----------------------------------------- >>>>>>>>>>> The health warning is not visible in the dashboard (also not in the >>>>>>>>>>> mirroring menu), the daemon still seems to be running, dropped >>>>>>>>>>> nothing in the logs, >>>>>>>>>>> and claims to be "ok" in the dashboard - it's only that all images >>>>>>>>>>> show up in unknown state even though all seems to be working fine. >>>>>>>>>>> >>>>>>>>>>> Any idea on how to debug this? >>>>>>>>>>> When I restart the rbd-mirror service, all images come back as >>>>>>>>>>> green. I already encountered this twice in 3 days. >>>>>>>>>> >>>>>>>>>> The dashboard relies on the rbd-mirror daemon to provide it errors >>>>>>>>>> and >>>>>>>>>> warnings. You can see the status reported by rbd-mirror by running >>>>>>>>>> "ceph service status": >>>>>>>>>> >>>>>>>>>> $ ceph service status >>>>>>>>>> { >>>>>>>>>> "rbd-mirror": { >>>>>>>>>> "4152": { >>>>>>>>>> "status_stamp": "2019-09-13T08:58:41.937491-0400", >>>>>>>>>> "last_beacon": "2019-09-13T08:58:41.937491-0400", >>>>>>>>>> "status": { >>>>>>>>>> "json": >>>>>>>>>> "{\"1\":{\"name\":\"mirror\",\"callouts\":{},\"image_assigned_count\":1,\"image_error_count\":0,\"image_local_count\":1,\"image_remote_count\":1,\"image_warning_count\":0,\"instance_id\":\"4154\",\"leader\":true},\"2\":{\"name\":\"mirror_parent\",\"callouts\":{},\"image_assigned_count\":0,\"image_error_count\":0,\"image_local_count\":0,\"image_remote_count\":0,\"image_warning_count\":0,\"instance_id\":\"4156\",\"leader\":true}}" >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> In your case, most likely it seems like rbd-mirror thinks all is good >>>>>>>>>> with the world so it's not reporting any errors. >>>>>>>>> >>>>>>>>> This is indeed the case: >>>>>>>>> >>>>>>>>> # ceph service status >>>>>>>>> { >>>>>>>>> "rbd-mirror": { >>>>>>>>> "84243": { >>>>>>>>> "status_stamp": "2019-09-13 15:40:01.149815", >>>>>>>>> "last_beacon": "2019-09-13 15:40:26.151381", >>>>>>>>> "status": { >>>>>>>>> "json": >>>>>>>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"84247\",\"leader\":true}}" >>>>>>>>> } >>>>>>>>> } >>>>>>>>> }, >>>>>>>>> "rgw": { >>>>>>>>> ... >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>>> The "down" state indicates that the rbd-mirror daemon isn't correctly >>>>>>>>>> watching the "rbd_mirroring" object in the pool. You can see who it >>>>>>>>>> watching that object by running the "rados" "listwatchers" command: >>>>>>>>>> >>>>>>>>>> $ rados -p <pool name> listwatchers rbd_mirroring >>>>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769010788992 >>>>>>>>>> watcher=1.2.3.4:0/199388543 client.4154 cookie=94769061031424 >>>>>>>>>> >>>>>>>>>> In my case, the "4154" from "client.4154" is the unique global id for >>>>>>>>>> my connection to the cluster, which relates back to the "ceph service >>>>>>>>>> status" dump which also shows status by daemon using the unique >>>>>>>>>> global >>>>>>>>>> id. >>>>>>>>> >>>>>>>>> Sadly(?), this looks as expected: >>>>>>>>> >>>>>>>>> # rados -p rbd listwatchers rbd_mirroring >>>>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139770046978672 >>>>>>>>> watcher=10.160.19.240:0/2922488671 client.84247 cookie=139771389162560 >>>>>>>> >>>>>>>> Hmm, the unique id is different (84243 vs 84247). I wouldn't have >>>>>>>> expected the global id to have changed. Did you restart the Ceph >>>>>>>> cluster or MONs? Do you see any "peer assigned me a different >>>>>>>> global_id" errors in your rbd-mirror logs? >>>>>>>> >>>>>>>> I'll open a tracker ticket to fix the "ceph service status", though, >>>>>>>> since clearly your global id changed but it wasn't noticed by the >>>>>>>> service daemon status updater. >>>>>>> >>>>>>> ... also, can you please provide the output from the following via a >>>>>>> pastebin link? >>>>>>> >>>>>>> # rados -p rbd listomapvals rbd_mirroring >>>>>> >>>>>> Of course, here you go: >>>>>> https://0x0.st/zy8J.txt >>>>> >>>>> Thanks. For the case above of global image id >>>>> 1a53fafa-37ef-4edf-9633-c2ba3323ed93, the on-disk status shows that it >>>>> was last updated by client.84247 / nonce 2922488671, which correctly >>>>> matches your watcher so the status should be "up": >>>>> >>>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 >>>>> value (232 bytes) : >>>>> 00000000 01 01 2c 00 00 00 08 17 49 01 00 00 00 00 00 01 >>>>> |..,.....I.......| <--- "17 49 01 00 00 00 00 00" (84247) is the >>>>> instance id >>>>> 00000010 01 01 1c 00 00 00 03 00 00 00 5f a3 31 ae 10 00 >>>>> |.........._.1...| <--- "5f a3 31 ae" is the nonce (2922488671) >>>>> 00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00 >>>>> |................| <--- "0a a0 13 f0" is the IP address >>>>> (10.160.9.240) >>>>> 00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 >>>>> |.............rep| >>>>> 00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, >>>>> master_p| >>>>> 00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f >>>>> |osition=[object_| >>>>> 00000060 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, >>>>> tag_t| >>>>> 00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, >>>>> entry_tid=| >>>>> 00000080 32 36 34 34 33 5d 2c 20 6d 69 72 72 6f 72 5f 70 |26443], >>>>> mirror_p| >>>>> 00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f >>>>> |osition=[object_| >>>>> 000000a0 6e 75 6d 62 65 72 3d 31 39 2c 20 74 61 67 5f 74 |number=19, >>>>> tag_t| >>>>> 000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, >>>>> entry_tid=| >>>>> 000000c0 32 36 34 34 33 5d 2c 20 65 6e 74 72 69 65 73 5f |26443], >>>>> entries_| >>>>> 000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 51 >>>>> |behind_master=0Q| >>>>> 000000e0 aa 7b 5d 1b 5f 4f 33 00 |.{]._O3.| >>>>> 000000e8 >>>>> >>>>> The only thing I can think of is that somehow the watcher entity >>>>> instance has a different encoding and its failing a comparison. Can >>>>> you restart rbd-mirror such that the statuses list "up+replaying" and >>>>> then run the following? >>>>> >>>>> # rados -p rbd getomapval rbd_mirroring >>>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 >>>> >>>> Interesting! Again, thanks for the detailed context - learning a bit more >>>> about the internals is one of the many reasons why we love Ceph so much, >>>> and something which fully proprietary code will usually never deliver :-). >>>> >>>> Here's the output after the restart, image is in up+replaying state: >>>> >>>> # rados -p rbd getomapval rbd_mirroring >>>> status_global_1a53fafa-37ef-4edf-9633-c2ba3323ed93 >>>> value (232 bytes) : >>>> 00000000 01 01 2c 00 00 00 08 ec 50 01 00 00 00 00 00 01 >>>> |..,.....P.......| >>>> 00000010 01 01 1c 00 00 00 03 00 00 00 0b 24 cd a5 10 00 >>>> |...........$....| >>>> 00000020 00 00 02 00 00 00 0a a0 13 f0 00 00 00 00 00 00 >>>> |................| >>>> 00000030 00 00 01 01 b0 00 00 00 04 a2 00 00 00 72 65 70 >>>> |.............rep| >>>> 00000040 6c 61 79 69 6e 67 2c 20 6d 61 73 74 65 72 5f 70 |laying, >>>> master_p| >>>> 00000050 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f >>>> |osition=[object_| >>>> 00000060 6e 75 6d 62 65 72 3d 31 38 2c 20 74 61 67 5f 74 |number=18, >>>> tag_t| >>>> 00000070 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, >>>> entry_tid=| >>>> 00000080 32 37 36 32 36 5d 2c 20 6d 69 72 72 6f 72 5f 70 |27626], >>>> mirror_p| >>>> 00000090 6f 73 69 74 69 6f 6e 3d 5b 6f 62 6a 65 63 74 5f >>>> |osition=[object_| >>>> 000000a0 6e 75 6d 62 65 72 3d 31 38 2c 20 74 61 67 5f 74 |number=18, >>>> tag_t| >>>> 000000b0 69 64 3d 36 2c 20 65 6e 74 72 79 5f 74 69 64 3d |id=6, >>>> entry_tid=| >>>> 000000c0 32 37 36 32 36 5d 2c 20 65 6e 74 72 69 65 73 5f |27626], >>>> entries_| >>>> 000000d0 62 65 68 69 6e 64 5f 6d 61 73 74 65 72 3d 30 eb >>>> |behind_master=0.| >>>> 000000e0 b3 7b 5d 27 9c d8 31 00 |.{]'..1.| >>>> 000000e8 >>>> >>>> IIUC, this decodes to instance ID 86252, IP address of course stayed the >>>> same. >>>> >>>> Checking the other output: >>>> >>>> # ceph service status >>>> { >>>> "rbd-mirror": { >>>> "86248": { >>>> "status_stamp": "2019-09-13 17:26:15.391048", >>>> "last_beacon": "2019-09-13 17:26:25.391759", >>>> "status": { >>>> "json": >>>> "{\"2\":{\"name\":\"rbd\",\"callouts\":{},\"image_assigned_count\":51,\"image_error_count\":0,\"image_local_count\":51,\"image_remote_count\":51,\"image_warning_count\":0,\"instance_id\":\"86252\",\"leader\":true}}" >>>> } >>>> } >>>> }, >>>> ... >>>> } >>>> >>>> # rados -p rbd listwatchers rbd_mirroring >>>> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140089552292144 >>>> watcher=10.160.19.240:0/2781684747 client.86252 cookie=140090961572928 >>>> >>>> This looks as strange as before again: Global instance ID is 86248, but >>>> instance ID (and what I find in the omap dump) is 86252. >>>> >>>> However, things look okay in the dashboard again and also: >>>> # rbd mirror pool status >>>> health: OK >>>> images: 51 total >>>> 51 replaying >>>> >>>> Cheers, >>>> Oliver >>> >>> Can you also provide the output from "ceph features"? >> >> Here you go: >> ------------------------------------------------------ >> # ceph features >> { >> "mon": [ >> { >> "features": "0x3ffddff8ffacffff", >> "release": "luminous", >> "num": 1 >> } >> ], >> "osd": [ >> { >> "features": "0x3ffddff8ffacffff", >> "release": "luminous", >> "num": 6 >> } >> ], >> "client": [ >> { >> "features": "0x3ffddff8ffacffff", >> "release": "luminous", >> "num": 6 >> } >> ], >> "mgr": [ >> { >> "features": "0x3ffddff8ffacffff", >> "release": "luminous", >> "num": 1 >> } >> ] >> } >> ------------------------------------------------------ >> This is a rather fresh Nautilus cluster, which has not yet seen any version >> upgrade in its lifetime. >> >> Cheers, >> Oliver >> >> >>> >>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Cheers, >>>>>> Oliver >>>>>> >>>>>>> >>>>>>>>> However, the dashboard still shows those images in "unknown", and >>>>>>>>> this also shows up via command line: >>>>>>>>> >>>>>>>>> # rbd mirror pool status >>>>>>>>> health: WARNING >>>>>>>>> images: 51 total >>>>>>>>> 51 unknown >>>>>>>>> # rbd mirror image status test-vm.physik.uni-bonn.de-disk1 >>>>>>>>> test-vm.physik.uni-bonn.de-disk2: >>>>>>>>> global_id: 1a53fafa-37ef-4edf-9633-c2ba3323ed93 >>>>>>>>> state: down+replaying >>>>>>>>> description: replaying, master_position=[object_number=18, >>>>>>>>> tag_tid=6, entry_tid=25202], mirror_position=[object_number=18, >>>>>>>>> tag_tid=6, entry_tid=25202], entries_behind_master=0 >>>>>>>>> last_update: 2019-09-13 15:55:15 >>>>>>>>> >>>>>>>>> Any ideas on what else could cause this? >>>>>>>>> >>>>>>>>> Cheers and thanks, >>>>>>>>> Oliver >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Any idea on this (or how I can extract more information)? >>>>>>>>>>> I fear keeping high-level debug logs active for ~24h is not >>>>>>>>>>> feasible. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Oliver >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2019-09-11 19:14, Jason Dillaman wrote: >>>>>>>>>>>> On Wed, Sep 11, 2019 at 12:57 PM Oliver Freyermuth >>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Dear Jason, >>>>>>>>>>>>> >>>>>>>>>>>>> I played a bit more with rbd mirroring and learned that deleting >>>>>>>>>>>>> an image at the source (or disabling journaling on it) >>>>>>>>>>>>> immediately moves the image to trash at the target - >>>>>>>>>>>>> but setting rbd_mirroring_delete_delay helps to have some more >>>>>>>>>>>>> grace time to catch human mistakes. >>>>>>>>>>>>> >>>>>>>>>>>>> However, I have issues restoring such an image which has been >>>>>>>>>>>>> moved to trash by the RBD-mirror daemon as user: >>>>>>>>>>>>> ----------------------------------- >>>>>>>>>>>>> [root@mon001 ~]# rbd trash ls -la >>>>>>>>>>>>> ID NAME SOURCE >>>>>>>>>>>>> DELETED_AT STATUS >>>>>>>>>>>>> PARENT >>>>>>>>>>>>> d4fbe8f63905 test-vm-XXXXXXXXXXXXXXXXXX-disk2 MIRRORING Wed Sep >>>>>>>>>>>>> 11 18:43:14 2019 protected until Thu Sep 12 18:43:14 2019 >>>>>>>>>>>>> [root@mon001 ~]# rbd trash restore --image foo-image d4fbe8f63905 >>>>>>>>>>>>> rbd: restore error: 2019-09-11 18:50:15.387 7f5fa9590b00 -1 >>>>>>>>>>>>> librbd::api::Trash: restore: Current trash source: mirroring does >>>>>>>>>>>>> not match expected: user >>>>>>>>>>>>> (22) Invalid argument >>>>>>>>>>>>> ----------------------------------- >>>>>>>>>>>>> This is issued on the mon, which has the client.admin key, so it >>>>>>>>>>>>> should not be a permission issue. >>>>>>>>>>>>> It also fails when I try that in the Dashboard. >>>>>>>>>>>>> >>>>>>>>>>>>> Sadly, the error message is not clear enough for me to figure out >>>>>>>>>>>>> what could be the problem - do you see what I did wrong? >>>>>>>>>>>> >>>>>>>>>>>> Good catch, it looks like we accidentally broke this in Nautilus >>>>>>>>>>>> when >>>>>>>>>>>> image live-migration support was added. I've opened a new tracker >>>>>>>>>>>> ticket to fix this [1]. >>>>>>>>>>>> >>>>>>>>>>>>> Cheers and thanks again, >>>>>>>>>>>>> Oliver >>>>>>>>>>>>> >>>>>>>>>>>>> On 2019-09-10 23:17, Oliver Freyermuth wrote: >>>>>>>>>>>>>> Dear Jason, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2019-09-10 23:04, Jason Dillaman wrote: >>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 2:08 PM Oliver Freyermuth >>>>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Dear Jason, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2019-09-10 18:50, Jason Dillaman wrote: >>>>>>>>>>>>>>>>> On Tue, Sep 10, 2019 at 12:25 PM Oliver Freyermuth >>>>>>>>>>>>>>>>> <freyerm...@physik.uni-bonn.de> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Dear Cephalopodians, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have two questions about RBD mirroring. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1) I can not get it to work - my setup is: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - One cluster holding the live RBD volumes and >>>>>>>>>>>>>>>>>> snapshots, in pool "rbd", cluster name "ceph", >>>>>>>>>>>>>>>>>> running latest Mimic. >>>>>>>>>>>>>>>>>> I ran "rbd mirror pool enable rbd pool" on that >>>>>>>>>>>>>>>>>> cluster and created a cephx user "rbd_mirror" with (is there >>>>>>>>>>>>>>>>>> a better way?): >>>>>>>>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror mon >>>>>>>>>>>>>>>>>> 'allow r' osd 'allow class-read object_prefix rbd_children, >>>>>>>>>>>>>>>>>> allow pool rbd r' -o ceph.client.rbd_mirror.keyring >>>>>>>>>>>>>>>>>> --cluster ceph >>>>>>>>>>>>>>>>>> In that pool, two images have the journaling >>>>>>>>>>>>>>>>>> feature activated, all others have it disabled still (so I >>>>>>>>>>>>>>>>>> would expect these two to be mirrored). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can just use "mon 'profile rbd' osd 'profile rbd'" for >>>>>>>>>>>>>>>>> the caps -- >>>>>>>>>>>>>>>>> but you definitely need more than read-only permissions to >>>>>>>>>>>>>>>>> the remote >>>>>>>>>>>>>>>>> cluster since it needs to be able to create snapshots of >>>>>>>>>>>>>>>>> remote images >>>>>>>>>>>>>>>>> and update/trim the image journals. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> these profiles really make life a lot easier. I should have >>>>>>>>>>>>>>>> thought of them rather than "guessing" a potentially good >>>>>>>>>>>>>>>> configuration... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Another (empty) cluster running latest Nautilus, >>>>>>>>>>>>>>>>>> cluster name "ceph", pool "rbd". >>>>>>>>>>>>>>>>>> I've used the dashboard to activate mirroring for >>>>>>>>>>>>>>>>>> the RBD pool, and then added a peer with cluster name >>>>>>>>>>>>>>>>>> "ceph-virt", cephx-ID "rbd_mirror", filled in the mons and >>>>>>>>>>>>>>>>>> key created above. >>>>>>>>>>>>>>>>>> I've then run: >>>>>>>>>>>>>>>>>> ceph auth get-or-create client.rbd_mirror_backup >>>>>>>>>>>>>>>>>> mon 'allow r' osd 'allow class-read object_prefix >>>>>>>>>>>>>>>>>> rbd_children, allow pool rbd rwx' -o >>>>>>>>>>>>>>>>>> client.rbd_mirror_backup.keyring --cluster ceph >>>>>>>>>>>>>>>>>> and deployed that key on the rbd-mirror machine, >>>>>>>>>>>>>>>>>> and started the service with: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please use "mon 'profile rbd-mirror' osd 'profile rbd'" for >>>>>>>>>>>>>>>>> your caps [1]. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That did the trick (in combination with the above)! >>>>>>>>>>>>>>>> Again a case of PEBKAC: I should have read the documentation >>>>>>>>>>>>>>>> until the end, clearly my fault. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It works well now, even though it seems to run a bit slow (~35 >>>>>>>>>>>>>>>> MB/s for the initial sync when everything is 1 GBit/s), >>>>>>>>>>>>>>>> but that may also be caused by combination of some very >>>>>>>>>>>>>>>> limited hardware on the receiving end (which will be scaled up >>>>>>>>>>>>>>>> in the future). >>>>>>>>>>>>>>>> A single host with 6 disks, replica 3 and a RAID controller >>>>>>>>>>>>>>>> which can only do RAID0 and not JBOD is certainly not ideal, >>>>>>>>>>>>>>>> so commit latency may cause this slow bandwidth. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You could try increasing "rbd_concurrent_management_ops" from >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> default of 10 ops to something higher to attempt to account for >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> latency. However, I wouldn't expect near-line speed w/ RBD >>>>>>>>>>>>>>> mirroring. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks - I will play with this option once we have more storage >>>>>>>>>>>>>> available in the target pool ;-). >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> systemctl start >>>>>>>>>>>>>>>>>> ceph-rbd-mirror@rbd_mirror_backup.service >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> After this, everything looks fine: >>>>>>>>>>>>>>>>>> # rbd mirror pool info >>>>>>>>>>>>>>>>>> Mode: pool >>>>>>>>>>>>>>>>>> Peers: >>>>>>>>>>>>>>>>>> UUID NAME >>>>>>>>>>>>>>>>>> CLIENT >>>>>>>>>>>>>>>>>> XXXXXXXXXXX ceph-virt >>>>>>>>>>>>>>>>>> client.rbd_mirror >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The service also seems to start fine, but logs show >>>>>>>>>>>>>>>>>> (debug rbd_mirror=20): >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> rbd::mirror::ClusterWatcher:0x5575e2a7d390 >>>>>>>>>>>>>>>>>> resolve_peer_config_keys: retrieving config-key: pool_id=2, >>>>>>>>>>>>>>>>>> pool_name=rbd, peer_uuid=XXXXXXXXXXX >>>>>>>>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 >>>>>>>>>>>>>>>>>> update_pool_replayers: enter >>>>>>>>>>>>>>>>>> rbd::mirror::Mirror: 0x5575e29c7240 >>>>>>>>>>>>>>>>>> update_pool_replayers: restarting failed pool replayer for >>>>>>>>>>>>>>>>>> uuid: XXXXXXXXXXX cluster: ceph-virt client: >>>>>>>>>>>>>>>>>> client.rbd_mirror >>>>>>>>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 init: >>>>>>>>>>>>>>>>>> replaying for uuid: XXXXXXXXXXX cluster: ceph-virt client: >>>>>>>>>>>>>>>>>> client.rbd_mirror >>>>>>>>>>>>>>>>>> rbd::mirror::PoolReplayer: 0x5575e2a7da20 >>>>>>>>>>>>>>>>>> init_rados: error connecting to remote peer uuid: >>>>>>>>>>>>>>>>>> XXXXXXXXXXX cluster: ceph-virt client: client.rbd_mirror: >>>>>>>>>>>>>>>>>> (95) Operation not supported >>>>>>>>>>>>>>>>>> rbd::mirror::ServiceDaemon: 0x5575e29c8d70 >>>>>>>>>>>>>>>>>> add_or_update_callout: pool_id=2, callout_id=2, >>>>>>>>>>>>>>>>>> callout_level=error, text=unable to connect to remote cluster >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If it's still broken after fixing your caps above, perhaps >>>>>>>>>>>>>>>>> increase >>>>>>>>>>>>>>>>> debugging for "rados", "monc", "auth", and "ms" to see if you >>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> determine the source of the op not supported error. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I already tried storing the ceph.client.rbd_mirror.keyring >>>>>>>>>>>>>>>>>> (i.e. from the cluster with the live images) on the >>>>>>>>>>>>>>>>>> rbd-mirror machine explicitly (i.e. not only in mon config >>>>>>>>>>>>>>>>>> storage), >>>>>>>>>>>>>>>>>> and after doing that: >>>>>>>>>>>>>>>>>> rbd -m mon_ip_of_ceph_virt_cluster --id=rbd_mirror ls >>>>>>>>>>>>>>>>>> works fine. So it's not a connectivity issue. Maybe a >>>>>>>>>>>>>>>>>> permission issue? Or did I miss something? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any idea what "operation not supported" means? >>>>>>>>>>>>>>>>>> It's unclear to me whether things should work well using >>>>>>>>>>>>>>>>>> Mimic with Nautilus, and enabling pool mirroring but only >>>>>>>>>>>>>>>>>> having journaling on for two images is a supported case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes and yes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2) Since there is a performance drawback (about 2x) for >>>>>>>>>>>>>>>>>> journaling, is it also possible to only mirror snapshots, >>>>>>>>>>>>>>>>>> and leave the live volumes alone? >>>>>>>>>>>>>>>>>> This would cover the common backup usecase before >>>>>>>>>>>>>>>>>> deferred mirroring is implemented (or is it there already?). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is in-development right now and will hopefully land for >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> Octopus release. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That would be very cool. Just to clarify: You mean the "real" >>>>>>>>>>>>>>>> deferred mirroring, not a "snapshot only" mirroring? >>>>>>>>>>>>>>>> Is it already clear if this will require Octopous (or a later >>>>>>>>>>>>>>>> release) on both ends, or only on the receiving side? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I might not be sure what you mean by deferred mirroring. You >>>>>>>>>>>>>>> can delay >>>>>>>>>>>>>>> the replay of the journal via the "rbd_mirroring_replay_delay" >>>>>>>>>>>>>>> configuration option so that your DR site can be X seconds >>>>>>>>>>>>>>> behind the >>>>>>>>>>>>>>> primary at a minimum. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is indeed what I was thinking of... >>>>>>>>>>>>>> >>>>>>>>>>>>>>> For Octopus we are working on on-demand and >>>>>>>>>>>>>>> scheduled snapshot mirroring between sites -- no journal is >>>>>>>>>>>>>>> involved. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ... and this is what I was dreaming of. We keep snapshots of VMs >>>>>>>>>>>>>> to be able to roll them back. >>>>>>>>>>>>>> We'd like to also keep those snapshots in a separate Ceph >>>>>>>>>>>>>> instance as an additional safety-net (in addition to an offline >>>>>>>>>>>>>> backup of those snapshots with Benji backup). >>>>>>>>>>>>>> It is not (yet) clear to me whether we can pay the "2 x" price >>>>>>>>>>>>>> for journaling in the long run, so this would be the way to go >>>>>>>>>>>>>> in case we can't. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Since I got you personally, I have two bonus questions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Your talk: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring.pdf >>>>>>>>>>>>>>>> mentions "rbd journal object flush age", which I'd >>>>>>>>>>>>>>>> translate with something like the "commit" mount option on a >>>>>>>>>>>>>>>> classical file system - correct? >>>>>>>>>>>>>>>> I don't find this switch documented anywhere, though - >>>>>>>>>>>>>>>> is there experience with it / what's the default? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It's a low-level knob that by default causes the journal to >>>>>>>>>>>>>>> flush its >>>>>>>>>>>>>>> pending IO events before it allows the corresponding IO to be >>>>>>>>>>>>>>> issued >>>>>>>>>>>>>>> against the backing image. Setting it to a value greater that >>>>>>>>>>>>>>> zero >>>>>>>>>>>>>>> will allow that many seconds of IO events to be batched >>>>>>>>>>>>>>> together in a >>>>>>>>>>>>>>> journal append operation and its helpful for high-throughout, >>>>>>>>>>>>>>> small IO >>>>>>>>>>>>>>> operations. Of course it turned out that a bug had broken that >>>>>>>>>>>>>>> option >>>>>>>>>>>>>>> a while where events would never batch, so a fix is currently >>>>>>>>>>>>>>> scheduled for backport of all active releases [1] w/ the goal >>>>>>>>>>>>>>> that no >>>>>>>>>>>>>>> one should need to tweak it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That's even better - since our setup is growing and we will keep >>>>>>>>>>>>>> upgrading, I'll then just keep things as they are now (no manual >>>>>>>>>>>>>> tweaking) >>>>>>>>>>>>>> and tag along the development. Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) I read I can run more than one rbd-mirror with >>>>>>>>>>>>>>>> Mimic/Nautilus. Do they load-balance the images, or "only" >>>>>>>>>>>>>>>> failover in case one of them dies? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Starting with Nautilus, the default configuration for >>>>>>>>>>>>>>> rbd-mirror is to >>>>>>>>>>>>>>> evenly divide the number of mirrored images between all running >>>>>>>>>>>>>>> daemons. This does not split the total load since some images >>>>>>>>>>>>>>> might be >>>>>>>>>>>>>>> hotter than others, but it at least spreads the load. >>>>>>>>>>>>>> >>>>>>>>>>>>>> That's fine enough for our use case. Spreading by "hotness" is a >>>>>>>>>>>>>> task without a clear answer >>>>>>>>>>>>>> and "temperature" may change quickly, so that's all I hoped for. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Many thanks again for the very helpful explanations! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Oliver >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers and many thanks for the quick and perfect help! >>>>>>>>>>>>>>>> Oliver >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers and thanks in advance, >>>>>>>>>>>>>>>>>> Oliver >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>>>>>>>> ceph-users@lists.ceph.com >>>>>>>>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>> https://docs.ceph.com/docs/master/rbd/rbd-mirroring/#rbd-mirror-daemon >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> Jason >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] https://github.com/ceph/ceph/pull/28539 >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] https://tracker.ceph.com/issues/41780 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Jason >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Jason >>>>> >>>> >>>> >>> >>> >> > > [1] https://tracker.ceph.com/issues/41833 >
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com