Thanks for your details, I appreciate it. Is your cephfs-mirror daemon containerized? Because that part didn't work for me, the daemon doesn't know how to contact the primary cluster without a mapping of the remote-site.conf and keyring into the container (I will use extra-container-args for that, similar to rbd-mirror). Also, adding the peer via cluster spec didn't work, I also had to create a token and import it. So there are a bunch of issues (I think I have written them all down somewhere) which are not only a documentation issue, but probably bugs. In the end, it's possible to set it up, but not as easy as it could (or even should?) be.

Zitat von Jan Zeinstra <j...@delectat.nl>:

Hi Eugen,

I've had difficulties; my setup was about 6 to 9 months old.
My difficulties arose from mixing old idea's and documentation with newer/
more accurate info.

As i recall, in the end, i've used IBM documentation in conjunction with
Ceph proper documentation.

My current (home brew) documentation clearly (helps for me) states what
happens on the source cluster, and what happens on the target cluster.
Also, of course, the order in which things are to be done, and how to
verify the success of every step.

Also not entirely clear is why and how onboarding of the target cluster
works; A magic token is produced, and somehow the source cluster knows how
to contact the remote host: issues with DNS and reachability lurk in the
dark. At least, that is what bothered me when setting things up.

So; when i had things figured out, it all seemed like a
straightforward process, but it suffices to say that the first time is saw
bits moving it felt like a hard-fought victory.

Ever since then i was worried/ afraid for when something would happen (like
outage/ non availabilty of the target system) because it was unclear how to
fix issues. Exactly that has happened, and with help of you i'm now looking
at a chance to reinstate mirroring on the source system; but
troubleshooting/ resetting a mirroring solution is also not exceedingly
clearly documented.

Hope this helps in forming ideas on how to improve documentation, because
[expletive]: what a wonderful solution Ceph is! Not yet disssapointed in
the resilience, reliability and versatility!

Regards,

Jan Zeinstra

Op ma 14 apr 2025 om 12:11 schreef Eugen Block <ebl...@nde.ag>:

Glad it works for you. May I ask, did you have difficulties setting up
mirroring? I tried half a year ago and failed. Then I tried again last
week, I succeeded, but I faced several issues along the way. I would
like to improve the docs (I'm in contact with Zac about that), but if
you didn't encounter any issues and everything was clear, maybe I'm
the issue here. ;-)
Anyway, if you could share your setup experience, I could compare and
figure out which part of the docs could use some improvement.

Zitat von Jan Zeinstra <j...@delectat.nl>:

> Dear Eugen,
>
> This works: so in conclusion, the assumption is that removing the config
> with:
>   ceph config-key rm
> "cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2"
> And disabling the fs:
>   ceph fs snapshot mirror disable proofs
>
> yields:
>   ceph fs snapshot mirror peer_list prodfs
> Error EINVAL: filesystem prodfs is not mirrored
>
> Reenabling prodfs mirrorring:
>   ceph fs snapshot mirror enable prodfs
>   ceph fs snapshot mirror peer_list prodfs
>
> yields:
> {}
>
> So: very grateful for your help, and happily proceeding with re-enabling
> the mirroring solution.
> Again:
> Thank you,
>
> Jan Zeinstra
>
>
> Op ma 14 apr 2025 om 10:29 schreef Eugen Block <ebl...@nde.ag>:
>
>> I stopped my peer cluster to simulate it's gone, removing the peer
>> didn't work (although with a different message than yours), but
>> disabling snapshot mirroring did work:
>>
>> site-a:~ # ceph fs snapshot mirror disable cephfs
>> {}
>> site-a:~ # ceph fs snapshot mirror peer_list cephfs
>> Error EINVAL: filesystem cephfs is not mirrored
>>
>> I assume that this should suffice to start from scratch, but maybe you
>> can confirm.
>>
>> Zitat von Eugen Block <ebl...@nde.ag>:
>>
>> > Please don't drop the list from your responses.
>> >
>> > Have you tried disabling mirroring on prodfs after removing the
>> > keys? I haven't had too many chances too play around yet, no prod
>> > cluster using that.
>> >
>> > Zitat von Jan Zeinstra <j...@delectat.nl>:
>> >
>> >> Thanks for the response,
>> >> I got:
>> >> ceph config-key ls |grep "peer/prodfs"
>> >>    "cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2",
>> >>    "cephfs/mirror/peer/prodfs/f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5",
>> >>
>> >> Since there is no mirroring at all going on, I saw no harm in
deleting
>> them
>> >> both.
>> >>  ceph config-key rm
>> >> "cephfs/mirror/peer/prodfs/f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5"
>> >> key deleted
>> >>  ceph config-key rm
>> >> "cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2"
>> >> key deleted
>> >>
>> >> And sure enough:
>> >>  ceph config-key ls |grep "peer/prodfs"
>> >> resulted in nothing found
>> >>
>> >> However:
>> >>  ceph fs snapshot mirror peer_list proofs
>> >> gave
>> >> {"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name":
>> >> "client.mirror_remote", "site_name": "bk-site", "fs_name": "prodfs"}}
>> >>
>> >> Also after redeploying the daemon.
>> >>
>> >> Searching the config for 'bk-site' and '9afbfe8cc1c5' yielded
nothing.
>> >>
>> >> Do you have more suggestions, where to look ?
>> >>
>> >> (Mirroring is still stuck in trying to find the now nonexistent
remote
>> >> cluster:
>> >>  cephadm logs --name cephfs-mirror.s1mon.lvlkwp
>> >>
>> >> Apr 14 08:02:36 s1mon systemd[1]: Started Ceph
>> cephfs-mirror.s1mon.lvlkwp
>> >> for d0ea284a-8a16-11ee-9232-5934f0f00ec2.
>> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: set uid:gid to 167:167
>> >> (ceph:ceph)
>> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: ceph version 18.2.4
>> >> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable), process
>> >> cephfs-mirror, pid 2
>> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: pidfile_write: ignore
>> empty
>> >> --pid-file
>> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: mgrc
>> service_daemon_register
>> >> cephfs-mirror.23174196 metadata
>> >> {arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4
>> >> (e7ad5345525c7aa95470c26863873b581076945d) reef
>> >> (stable),ceph_version_short=18.2.4,container_hostna>
>> >> Apr 14 08:02:40 s1mon cephfs-mirror[2097671]: cephfs::mirror::Utils
>> >> connect: error connecting to bk-site: (2) No such file or directory
>> >> Apr 14 08:02:40 s1mon cephfs-mirror[2097671]:
>> >> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
init:
>> >> error connecting to remote cluster: (2) No such file or directory
>> >> Apr 14 08:02:40 s1mon conmon[2097665]: unable to get monitor info
from
>> DNS
>> >> SRV with service name: ceph-mon
>> >> Apr 14 08:02:40 s1mon conmon[2097665]: 2025-04-14T06:02:40.831+0000
>> >> 7ff4ed4c1640 -1 failed for service _ceph-mon._tcp
>> >> Apr 14 08:02:40 s1mon conmon[2097665]: 2025-04-14T06:02:40.831+0000
>> >> 7ff4ed4c1640 -1 monclient: get_monmap_and_config cannot identify
>> monitors
>> >> to contact
>> >> Apr 14 08:02:40 s1mon conmon[2097665]: 2025-04-14T06:02:40.831+0000
>> >> 7ff4ed4c1640 -1 cephfs::mirror::Utils connect: error connecting to
>> bk-site:
>> >> (2) No such file or directory
>> >> Apr 14 08:02:40 s1mon conmon[2097665]: 2025-04-14T06:02:40.831+0000
>> >> 7ff4ed4c1640 -1
>> >> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
init:
>> >> error connecting to remote cluster: (2) No such file or directory
>> >> )
>> >>
>> >> Op vr 11 apr 2025 om 14:42 schreef Eugen Block <ebl...@nde.ag>:
>> >>
>> >>> Hi,
>> >>>
>> >>> I would expect that you have a similar config-key entry:
>> >>>
>> >>> ceph config-key ls |grep "peer/cephfs"
>> >>>
 "cephfs/mirror/peer/cephfs/18c02021-8902-4e3f-bc17-eaf48331cc56",
>> >>>
>> >>> Maybe removing that peer would already suffice?
>> >>>
>> >>>
>> >>> Zitat von Jan Zeinstra <j...@delectat.nl>:
>> >>>
>> >>>> Hi,
>> >>>> This is my first post to the forum and I don't know if it's
>> appropriate,
>> >>>> but I'd like to express my gratitude to all people working hard on
>> ceph
>> >>>> because I think it's a fantastic piece of software.
>> >>>>
>> >>>> The problem I'm having is caused by me; we had a well working ceph
fs
>> >>>> mirror solution; let's call it source cluster A, and target
cluster B.
>> >>>> Source cluster A is a modest cluster consisting of 6 instances, 3
OSD
>> >>>> instances, and 3 mon instances. The OSD instances all have 3 disks
>> >>> (HDD's)
>> >>>> and 3 OSD demons, totalling 9 OSD daemons and 9 HDD's. Target
cluster
>> B
>> >>> is
>> >>>> a single node system having 3 OSD daemons and 3 HDD's. Both
clusters
>> run
>> >>>> ceph 18.2.4 reef. Both clusters use Ubuntu 22.04 as OS throughout.
>> Both
>> >>>> systems are installed using cephadm.
>> >>>> I have destroyed cluster B, and have built it from the ground up (I
>> made
>> >>> a
>> >>>> mistake in PG sizing in the original cluster)
>> >>>> Now i find i cannot create/ reinstate the mirroring between 2 ceph
fs
>> >>>> filesystems, and i suspect there is a peer left behind in the
>> filesystem
>> >>> of
>> >>>> the source, pointing to the now non-existent target cluster.
>> >>>> When i do 'ceph fs snapshot mirror peer_list prodfs', i get:
>> >>>> '{"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name":
>> >>>> "client.mirror_remote", "site_name": "bk-site", "fs_name":
"prodfs"}}'
>> >>>> When i try to delete it: 'ceph fs snapshot mirror peer_remove
prodfs
>> >>>> f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5', i get: 'Error EACCES:
failed to
>> >>>> remove peeraccess denied: does your client key have mgr caps? See
>> >>>>
>> http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication
>> >>> ',
>> >>>> but the logging of the daemon points to the more likely reason of
>> >>> failure:
>> >>>> ----
>> >>>> Apr 08 12:54:26 s1mon systemd[1]: Started Ceph
>> cephfs-mirror.s1mon.lvlkwp
>> >>>> for d0ea284a-8a16-11ee-9232-5934f0f00ec2.
>> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: set uid:gid to 167:167
>> >>>> (ceph:ceph)
>> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: ceph version 18.2.4
>> >>>> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable), process
>> >>>> cephfs-mirror, pid 2
>> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: pidfile_write: ignore
>> empty
>> >>>> --pid-file
>> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: mgrc
>> service_daemon_register
>> >>>> cephfs-mirror.22849497 metadata
>> >>>> {arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4
>> >>>> (e7ad5345525c7a>
>> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
>> >>>> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> init:
>> >>>> remote monitor host=[v2:172.17.16.12:3300/0,v1:172.17.16.12:6789/0
]
>> >>>> Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+0000
>> >>>> 7f57c51ba640 -1 monclient(hunting): handle_auth_bad_method server
>> >>>> allowed_methods [2] but i only support [2,1]
>> >>>> Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+0000
>> >>>> 7f57d81e0640 -1 cephfs::mirror::Utils connect: error connecting to
>> >>> bk-site:
>> >>>> (13) Permission denied
>> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]: cephfs::mirror::Utils
>> >>> connect:
>> >>>> error connecting to bk-site: (13) Permission denied
>> >>>> Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+0000
>> >>>> 7f57d81e0640 -1
>> >>>> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> init:
>> >>>> error connecting to remote cl>
>> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
>> >>>> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> init:
>> >>>> error connecting to remote cluster: (13) Permission denied
>> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.362+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.386+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.430+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.466+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.767+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.811+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.851+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.891+0000
>> >>>> 7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be
>> generated
>> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received  signal:
Hangup
>> >>> from
>> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> alarm()
>> >>> )
>> >>>> UID: 0
>> >>>> ----
>> >>>> Is there any chance I can get the mirroring daemon to forget about
the
>> >>>> cluster I lost ?
>> >>>>
>> >>>> Best regards, Jan Zeinstra
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list -- ceph-users@ceph.io
>> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>>
>>
>>
>>
>>






_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to