Thanks for your details, I appreciate it. Is your cephfs-mirror daemon
containerized? Because that part didn't work for me, the daemon
doesn't know how to contact the primary cluster without a mapping of
the remote-site.conf and keyring into the container (I will use
extra-container-args for that, similar to rbd-mirror). Also, adding
the peer via cluster spec didn't work, I also had to create a token
and import it. So there are a bunch of issues (I think I have written
them all down somewhere) which are not only a documentation issue, but
probably bugs. In the end, it's possible to set it up, but not as easy
as it could (or even should?) be.
Zitat von Jan Zeinstra <j...@delectat.nl>:
> Hi Eugen,
>
> I've had difficulties; my setup was about 6 to 9 months old.
> My difficulties arose from mixing old idea's and documentation with
newer/
> more accurate info.
>
> As i recall, in the end, i've used IBM documentation in conjunction with
> Ceph proper documentation.
>
> My current (home brew) documentation clearly (helps for me) states what
> happens on the source cluster, and what happens on the target cluster.
> Also, of course, the order in which things are to be done, and how to
> verify the success of every step.
>
> Also not entirely clear is why and how onboarding of the target cluster
> works; A magic token is produced, and somehow the source cluster knows
how
> to contact the remote host: issues with DNS and reachability lurk in the
> dark. At least, that is what bothered me when setting things up.
>
> So; when i had things figured out, it all seemed like a
> straightforward process, but it suffices to say that the first time is
saw
> bits moving it felt like a hard-fought victory.
>
> Ever since then i was worried/ afraid for when something would happen
(like
> outage/ non availabilty of the target system) because it was unclear how
to
> fix issues. Exactly that has happened, and with help of you i'm now
looking
> at a chance to reinstate mirroring on the source system; but
> troubleshooting/ resetting a mirroring solution is also not exceedingly
> clearly documented.
>
> Hope this helps in forming ideas on how to improve documentation, because
> [expletive]: what a wonderful solution Ceph is! Not yet disssapointed in
> the resilience, reliability and versatility!
>
> Regards,
>
> Jan Zeinstra
>
> Op ma 14 apr 2025 om 12:11 schreef Eugen Block <ebl...@nde.ag>:
>
>> Glad it works for you. May I ask, did you have difficulties setting up
>> mirroring? I tried half a year ago and failed. Then I tried again last
>> week, I succeeded, but I faced several issues along the way. I would
>> like to improve the docs (I'm in contact with Zac about that), but if
>> you didn't encounter any issues and everything was clear, maybe I'm
>> the issue here. ;-)
>> Anyway, if you could share your setup experience, I could compare and
>> figure out which part of the docs could use some improvement.
>>
>> Zitat von Jan Zeinstra <j...@delectat.nl>:
>>
>> > Dear Eugen,
>> >
>> > This works: so in conclusion, the assumption is that removing the
config
>> > with:
>> > ceph config-key rm
>> > "cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2"
>> > And disabling the fs:
>> > ceph fs snapshot mirror disable proofs
>> >
>> > yields:
>> > ceph fs snapshot mirror peer_list prodfs
>> > Error EINVAL: filesystem prodfs is not mirrored
>> >
>> > Reenabling prodfs mirrorring:
>> > ceph fs snapshot mirror enable prodfs
>> > ceph fs snapshot mirror peer_list prodfs
>> >
>> > yields:
>> > {}
>> >
>> > So: very grateful for your help, and happily proceeding with
re-enabling
>> > the mirroring solution.
>> > Again:
>> > Thank you,
>> >
>> > Jan Zeinstra
>> >
>> >
>> > Op ma 14 apr 2025 om 10:29 schreef Eugen Block <ebl...@nde.ag>:
>> >
>> >> I stopped my peer cluster to simulate it's gone, removing the peer
>> >> didn't work (although with a different message than yours), but
>> >> disabling snapshot mirroring did work:
>> >>
>> >> site-a:~ # ceph fs snapshot mirror disable cephfs
>> >> {}
>> >> site-a:~ # ceph fs snapshot mirror peer_list cephfs
>> >> Error EINVAL: filesystem cephfs is not mirrored
>> >>
>> >> I assume that this should suffice to start from scratch, but maybe
you
>> >> can confirm.
>> >>
>> >> Zitat von Eugen Block <ebl...@nde.ag>:
>> >>
>> >> > Please don't drop the list from your responses.
>> >> >
>> >> > Have you tried disabling mirroring on prodfs after removing the
>> >> > keys? I haven't had too many chances too play around yet, no prod
>> >> > cluster using that.
>> >> >
>> >> > Zitat von Jan Zeinstra <j...@delectat.nl>:
>> >> >
>> >> >> Thanks for the response,
>> >> >> I got:
>> >> >> ceph config-key ls |grep "peer/prodfs"
>> >> >>
"cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2",
>> >> >>
"cephfs/mirror/peer/prodfs/f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5",
>> >> >>
>> >> >> Since there is no mirroring at all going on, I saw no harm in
>> deleting
>> >> them
>> >> >> both.
>> >> >> ceph config-key rm
>> >> >> "cephfs/mirror/peer/prodfs/f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5"
>> >> >> key deleted
>> >> >> ceph config-key rm
>> >> >> "cephfs/mirror/peer/prodfs/b308e268-c7f9-4401-a69a-e625955087f2"
>> >> >> key deleted
>> >> >>
>> >> >> And sure enough:
>> >> >> ceph config-key ls |grep "peer/prodfs"
>> >> >> resulted in nothing found
>> >> >>
>> >> >> However:
>> >> >> ceph fs snapshot mirror peer_list proofs
>> >> >> gave
>> >> >> {"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name":
>> >> >> "client.mirror_remote", "site_name": "bk-site", "fs_name":
"prodfs"}}
>> >> >>
>> >> >> Also after redeploying the daemon.
>> >> >>
>> >> >> Searching the config for 'bk-site' and '9afbfe8cc1c5' yielded
>> nothing.
>> >> >>
>> >> >> Do you have more suggestions, where to look ?
>> >> >>
>> >> >> (Mirroring is still stuck in trying to find the now nonexistent
>> remote
>> >> >> cluster:
>> >> >> cephadm logs --name cephfs-mirror.s1mon.lvlkwp
>> >> >>
>> >> >> Apr 14 08:02:36 s1mon systemd[1]: Started Ceph
>> >> cephfs-mirror.s1mon.lvlkwp
>> >> >> for d0ea284a-8a16-11ee-9232-5934f0f00ec2.
>> >> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: set uid:gid to
167:167
>> >> >> (ceph:ceph)
>> >> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: ceph version 18.2.4
>> >> >> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable), process
>> >> >> cephfs-mirror, pid 2
>> >> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: pidfile_write:
ignore
>> >> empty
>> >> >> --pid-file
>> >> >> Apr 14 08:02:36 s1mon cephfs-mirror[2097671]: mgrc
>> >> service_daemon_register
>> >> >> cephfs-mirror.23174196 metadata
>> >> >> {arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4
>> >> >> (e7ad5345525c7aa95470c26863873b581076945d) reef
>> >> >> (stable),ceph_version_short=18.2.4,container_hostna>
>> >> >> Apr 14 08:02:40 s1mon cephfs-mirror[2097671]:
cephfs::mirror::Utils
>> >> >> connect: error connecting to bk-site: (2) No such file or
directory
>> >> >> Apr 14 08:02:40 s1mon cephfs-mirror[2097671]:
>> >> >> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> init:
>> >> >> error connecting to remote cluster: (2) No such file or directory
>> >> >> Apr 14 08:02:40 s1mon conmon[2097665]: unable to get monitor info
>> from
>> >> DNS
>> >> >> SRV with service name: ceph-mon
>> >> >> Apr 14 08:02:40 s1mon conmon[2097665]:
2025-04-14T06:02:40.831+0000
>> >> >> 7ff4ed4c1640 -1 failed for service _ceph-mon._tcp
>> >> >> Apr 14 08:02:40 s1mon conmon[2097665]:
2025-04-14T06:02:40.831+0000
>> >> >> 7ff4ed4c1640 -1 monclient: get_monmap_and_config cannot identify
>> >> monitors
>> >> >> to contact
>> >> >> Apr 14 08:02:40 s1mon conmon[2097665]:
2025-04-14T06:02:40.831+0000
>> >> >> 7ff4ed4c1640 -1 cephfs::mirror::Utils connect: error connecting to
>> >> bk-site:
>> >> >> (2) No such file or directory
>> >> >> Apr 14 08:02:40 s1mon conmon[2097665]:
2025-04-14T06:02:40.831+0000
>> >> >> 7ff4ed4c1640 -1
>> >> >> cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> init:
>> >> >> error connecting to remote cluster: (2) No such file or directory
>> >> >> )
>> >> >>
>> >> >> Op vr 11 apr 2025 om 14:42 schreef Eugen Block <ebl...@nde.ag>:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I would expect that you have a similar config-key entry:
>> >> >>>
>> >> >>> ceph config-key ls |grep "peer/cephfs"
>> >> >>>
>> "cephfs/mirror/peer/cephfs/18c02021-8902-4e3f-bc17-eaf48331cc56",
>> >> >>>
>> >> >>> Maybe removing that peer would already suffice?
>> >> >>>
>> >> >>>
>> >> >>> Zitat von Jan Zeinstra <j...@delectat.nl>:
>> >> >>>
>> >> >>>> Hi,
>> >> >>>> This is my first post to the forum and I don't know if it's
>> >> appropriate,
>> >> >>>> but I'd like to express my gratitude to all people working hard
on
>> >> ceph
>> >> >>>> because I think it's a fantastic piece of software.
>> >> >>>>
>> >> >>>> The problem I'm having is caused by me; we had a well working
ceph
>> fs
>> >> >>>> mirror solution; let's call it source cluster A, and target
>> cluster B.
>> >> >>>> Source cluster A is a modest cluster consisting of 6 instances,
3
>> OSD
>> >> >>>> instances, and 3 mon instances. The OSD instances all have 3
disks
>> >> >>> (HDD's)
>> >> >>>> and 3 OSD demons, totalling 9 OSD daemons and 9 HDD's. Target
>> cluster
>> >> B
>> >> >>> is
>> >> >>>> a single node system having 3 OSD daemons and 3 HDD's. Both
>> clusters
>> >> run
>> >> >>>> ceph 18.2.4 reef. Both clusters use Ubuntu 22.04 as OS
throughout.
>> >> Both
>> >> >>>> systems are installed using cephadm.
>> >> >>>> I have destroyed cluster B, and have built it from the ground
up (I
>> >> made
>> >> >>> a
>> >> >>>> mistake in PG sizing in the original cluster)
>> >> >>>> Now i find i cannot create/ reinstate the mirroring between 2
ceph
>> fs
>> >> >>>> filesystems, and i suspect there is a peer left behind in the
>> >> filesystem
>> >> >>> of
>> >> >>>> the source, pointing to the now non-existent target cluster.
>> >> >>>> When i do 'ceph fs snapshot mirror peer_list prodfs', i get:
>> >> >>>> '{"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name":
>> >> >>>> "client.mirror_remote", "site_name": "bk-site", "fs_name":
>> "prodfs"}}'
>> >> >>>> When i try to delete it: 'ceph fs snapshot mirror peer_remove
>> prodfs
>> >> >>>> f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5', i get: 'Error EACCES:
>> failed to
>> >> >>>> remove peeraccess denied: does your client key have mgr caps?
See
>> >> >>>>
>> >>
http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication
>> >> >>> ',
>> >> >>>> but the logging of the daemon points to the more likely reason
of
>> >> >>> failure:
>> >> >>>> ----
>> >> >>>> Apr 08 12:54:26 s1mon systemd[1]: Started Ceph
>> >> cephfs-mirror.s1mon.lvlkwp
>> >> >>>> for d0ea284a-8a16-11ee-9232-5934f0f00ec2.
>> >> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: set uid:gid to
167:167
>> >> >>>> (ceph:ceph)
>> >> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: ceph version 18.2.4
>> >> >>>> (e7ad5345525c7aa95470c26863873b581076945d) reef (stable),
process
>> >> >>>> cephfs-mirror, pid 2
>> >> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: pidfile_write:
ignore
>> >> empty
>> >> >>>> --pid-file
>> >> >>>> Apr 08 12:54:26 s1mon cephfs-mirror[310088]: mgrc
>> >> service_daemon_register
>> >> >>>> cephfs-mirror.22849497 metadata
>> >> >>>> {arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4
>> >> >>>> (e7ad5345525c7a>
>> >> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
>> >> >>>>
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> >> init:
>> >> >>>> remote monitor host=[v2:
172.17.16.12:3300/0,v1:172.17.16.12:6789/0
>> ]
>> >> >>>> Apr 08 12:54:30 s1mon conmon[310082]:
2025-04-08T10:54:30.365+0000
>> >> >>>> 7f57c51ba640 -1 monclient(hunting): handle_auth_bad_method
server
>> >> >>>> allowed_methods [2] but i only support [2,1]
>> >> >>>> Apr 08 12:54:30 s1mon conmon[310082]:
2025-04-08T10:54:30.365+0000
>> >> >>>> 7f57d81e0640 -1 cephfs::mirror::Utils connect: error connecting
to
>> >> >>> bk-site:
>> >> >>>> (13) Permission denied
>> >> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
cephfs::mirror::Utils
>> >> >>> connect:
>> >> >>>> error connecting to bk-site: (13) Permission denied
>> >> >>>> Apr 08 12:54:30 s1mon conmon[310082]:
2025-04-08T10:54:30.365+0000
>> >> >>>> 7f57d81e0640 -1
>> >> >>>>
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> >> init:
>> >> >>>> error connecting to remote cl>
>> >> >>>> Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
>> >> >>>>
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5)
>> >> init:
>> >> >>>> error connecting to remote cluster: (13) Permission denied
>> >> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 09 00:00:16 s1mon conmon[310082]:
2025-04-08T22:00:16.362+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 09 00:00:16 s1mon conmon[310082]:
2025-04-08T22:00:16.386+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 09 00:00:16 s1mon conmon[310082]:
2025-04-08T22:00:16.430+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 09 00:00:16 s1mon conmon[310082]:
2025-04-08T22:00:16.466+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 10 00:00:01 s1mon conmon[310082]:
2025-04-09T22:00:01.767+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 10 00:00:01 s1mon conmon[310082]:
2025-04-09T22:00:01.811+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> Apr 10 00:00:01 s1mon conmon[310082]:
2025-04-09T22:00:01.851+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 10 00:00:01 s1mon conmon[310082]:
2025-04-09T22:00:01.891+0000
>> >> >>>> 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be
>> >> generated
>> >> >>>> by pthread_kill(), raise(), abort(), alarm()>
>> >> >>>> Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received signal:
>> Hangup
>> >> >>> from
>> >> >>>> Kernel ( Could be generated by pthread_kill(), raise(), abort(),
>> >> alarm()
>> >> >>> )
>> >> >>>> UID: 0
>> >> >>>> ----
>> >> >>>> Is there any chance I can get the mirroring daemon to forget
about
>> the
>> >> >>>> cluster I lost ?
>> >> >>>>
>> >> >>>> Best regards, Jan Zeinstra
>> >> >>>> _______________________________________________
>> >> >>>> ceph-users mailing list -- ceph-users@ceph.io
>> >> >>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >> >>>
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> ceph-users mailing list -- ceph-users@ceph.io
>> >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >> >>>
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>>