[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager
Hey Venky, thank you very much for your response! > It would help if you could enable debug log for ceph-mgr, repeat the > steps you mention above and upload the log in the tracker. I have already collected log files after enabling the debug log by `ceph config set mgr mgr/snap_schedule/log_level debug`, and I would be happy to share it. > Could you please file a tracker here: > https://tracker.ceph.com/projects/cephfs/issues/new I signed up for an account, but need to wait for being approved by an administrator. Cheers, Sebastian ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
Hey Manuel, On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe wrote: > > OK, reconstructed with another example: > > -- source file system -- > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > File: /data/cephfs-2/test/x2 > Size: 1 Blocks: 0 IO Block: 65536 directory > Device: 2ch/44d Inode: 1099840816759 Links: 3 > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > Access: 2022-01-27 16:24:15.627783470 +0100 > Modify: 2022-01-27 16:24:22.001750514 +0100 > Change: 2022-01-27 16:24:51.294599055 +0100 > Birth: - > File: /data/cephfs-2/test/x2/y2 > Size: 1 Blocks: 0 IO Block: 65536 directory > Device: 2ch/44d Inode: 1099840816760 Links: 2 > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > Access: 2022-01-27 16:24:22.001750514 +0100 > Modify: 2022-01-27 16:24:27.712720985 +0100 > Change: 2022-01-27 16:24:51.307598988 +0100 > Birth: - > File: /data/cephfs-2/test/x2/y2/z > Size: 0 Blocks: 0 IO Block: 4194304 regular empty file > Device: 2ch/44d Inode: 1099840816761 Links: 1 > Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) > Access: 2022-01-27 16:24:27.713720980 +0100 > Modify: 2022-01-27 16:24:27.713720980 +0100 > Change: 2022-01-27 16:24:27.713720980 +0100 > Birth: - > > -- resulting remote file system -- > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat > File: /data/cephfs-3/test/x2 > Size: 0 Blocks: 0 IO Block: 65536 directory > Device: 2dh/45d Inode: 1099521812568 Links: 2 > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > Access: 2022-01-27 16:24:15.627783470 +0100 > Modify: 2022-01-27 16:24:22.001750514 +0100 > Change: 2022-01-27 16:25:53.638392179 +0100 > Birth: - The mirror daemon requires write access to a directory to update entries (it uses libcephfs with uid/gid 0:0). The mode/ownership changes are applied after creating the entry on the other cluster. There's probably no "quick" workarounds for this, I'm afraid. > > -- log excerpt -- > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > register_directory: dir_root=/test > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > try_lock_directory: dir_root=/test > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > try_lock_directory: dir_root=/test locked > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 5 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > register_directory: dir_root=/test registered with > replayer=0x56173a70a680 > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > sync_snaps: dir_root=/test > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > do_sync_snaps: dir_root=/test > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0 > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=. > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=.. > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=initial > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=second > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: local snap_map={1384=initial,1385=second} > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=1 > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=. > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=.. > debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: entry=initial > debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 20 > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > build_snap_map: snap_path=/test/.snap/initial, > metadata={primary_snap_id=1384} > debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 10 > cephfs::mirror::
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
Hi, thanks for the reply. Actually, mounting the source and remote fs on linux with kernel driver (Rocky Linux 8.5 default kernel), I can `rsync`. Is this to be expected? Cheers, On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar wrote: > > Hey Manuel, > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe wrote: > > > > OK, reconstructed with another example: > > > > -- source file system -- > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > > File: /data/cephfs-2/test/x2 > > Size: 1 Blocks: 0 IO Block: 65536 directory > > Device: 2ch/44d Inode: 1099840816759 Links: 3 > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-27 16:24:15.627783470 +0100 > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > Change: 2022-01-27 16:24:51.294599055 +0100 > > Birth: - > > File: /data/cephfs-2/test/x2/y2 > > Size: 1 Blocks: 0 IO Block: 65536 directory > > Device: 2ch/44d Inode: 1099840816760 Links: 2 > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-27 16:24:22.001750514 +0100 > > Modify: 2022-01-27 16:24:27.712720985 +0100 > > Change: 2022-01-27 16:24:51.307598988 +0100 > > Birth: - > > File: /data/cephfs-2/test/x2/y2/z > > Size: 0 Blocks: 0 IO Block: 4194304 regular empty > > file > > Device: 2ch/44d Inode: 1099840816761 Links: 1 > > Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-27 16:24:27.713720980 +0100 > > Modify: 2022-01-27 16:24:27.713720980 +0100 > > Change: 2022-01-27 16:24:27.713720980 +0100 > > Birth: - > > > > -- resulting remote file system -- > > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat > > File: /data/cephfs-3/test/x2 > > Size: 0 Blocks: 0 IO Block: 65536 directory > > Device: 2dh/45d Inode: 1099521812568 Links: 2 > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-27 16:24:15.627783470 +0100 > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > Change: 2022-01-27 16:25:53.638392179 +0100 > > Birth: - > > The mirror daemon requires write access to a directory to update > entries (it uses libcephfs with uid/gid 0:0). The mode/ownership > changes are applied after creating the entry on the other cluster. > > There's probably no "quick" workarounds for this, I'm afraid. > > > > > -- log excerpt -- > > > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > register_directory: dir_root=/test > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > try_lock_directory: dir_root=/test > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > try_lock_directory: dir_root=/test locked > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 5 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > register_directory: dir_root=/test registered with > > replayer=0x56173a70a680 > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > sync_snaps: dir_root=/test > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > do_sync_snaps: dir_root=/test > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0 > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: entry=. > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: entry=.. > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: entry=initial > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: entry=second > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: local snap_map={1384=initial,1385=second} > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=1 > > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_map: entry=. > > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20 > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > build_snap_
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe wrote: > > Hi, > > thanks for the reply. > > Actually, mounting the source and remote fs on linux with kernel > driver (Rocky Linux 8.5 default kernel), I can `rsync`. You are probably running rsync with --no-perms or a custom --chmod (or one of --no-o, --no-g) I guess? > > Is this to be expected? > > Cheers, > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar wrote: > > > > Hey Manuel, > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe > > wrote: > > > > > > OK, reconstructed with another example: > > > > > > -- source file system -- > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > > > File: /data/cephfs-2/test/x2 > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > Device: 2ch/44d Inode: 1099840816759 Links: 3 > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > Change: 2022-01-27 16:24:51.294599055 +0100 > > > Birth: - > > > File: /data/cephfs-2/test/x2/y2 > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > Device: 2ch/44d Inode: 1099840816760 Links: 2 > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-27 16:24:22.001750514 +0100 > > > Modify: 2022-01-27 16:24:27.712720985 +0100 > > > Change: 2022-01-27 16:24:51.307598988 +0100 > > > Birth: - > > > File: /data/cephfs-2/test/x2/y2/z > > > Size: 0 Blocks: 0 IO Block: 4194304 regular empty > > > file > > > Device: 2ch/44d Inode: 1099840816761 Links: 1 > > > Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-27 16:24:27.713720980 +0100 > > > Modify: 2022-01-27 16:24:27.713720980 +0100 > > > Change: 2022-01-27 16:24:27.713720980 +0100 > > > Birth: - > > > > > > -- resulting remote file system -- > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat > > > File: /data/cephfs-3/test/x2 > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > Device: 2dh/45d Inode: 1099521812568 Links: 2 > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > Change: 2022-01-27 16:25:53.638392179 +0100 > > > Birth: - > > > > The mirror daemon requires write access to a directory to update > > entries (it uses libcephfs with uid/gid 0:0). The mode/ownership > > changes are applied after creating the entry on the other cluster. > > > > There's probably no "quick" workarounds for this, I'm afraid. > > > > > > > > -- log excerpt -- > > > > > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > register_directory: dir_root=/test > > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > try_lock_directory: dir_root=/test > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > try_lock_directory: dir_root=/test locked > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 5 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > register_directory: dir_root=/test registered with > > > replayer=0x56173a70a680 > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > sync_snaps: dir_root=/test > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > do_sync_snaps: dir_root=/test > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0 > > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: entry=. > > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: entry=.. > > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: entry=initial > > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: entry=second > > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955) > > > build_snap_map: local snap_map={1384=initial,1385=second} > > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20 > > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1
[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager
On Fri, Jan 28, 2022 at 3:03 PM Sebastian Mazza wrote: > > Hey Venky, > > thank you very much for your response! > > > It would help if you could enable debug log for ceph-mgr, repeat the > > steps you mention above and upload the log in the tracker. > > > I have already collected log files after enabling the debug log by `ceph > config set mgr mgr/snap_schedule/log_level debug`, and I would be happy to > share it. > > > Could you please file a tracker here: > > https://tracker.ceph.com/projects/cephfs/issues/new > > I signed up for an account, but need to wait for being approved by an > administrator. Thanks. If you can share the logs, I can create the tracker in the meantime. > > > Cheers, > Sebastian > -- Cheers, Venky ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
I'm running rsync "-Wa", see below for a reproduction from scratch that actually syncs as root when no permissions are given on the directories. -- full mount options -- 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/ on /data/cephfs-2 type ceph (rw,noatime,name=samba,secret=,acl) 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/ on /data/cephfs-3 type ceph (rw,noatime,name=gateway,secret=,rbytes,acl) -- example -- 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y 0|0[root@gw-1 ~]# touch !$z touch /data/cephfs-2/test2/x/yz 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/. sending incremental file list ./ x/ x/yz x/y/ sent 165 bytes received 50 bytes 430.00 bytes/sec total size is 0 speedup is 0.00 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat File: /data/cephfs-3/test2 Size: 0 Blocks: 0 IO Block: 65536 directory Device: 2dh/45d Inode: 1099522341053 Links: 3 Access: (/d-) Uid: (0/root) Gid: (0/root) Access: 2022-01-28 11:10:31.436380533 +0100 Modify: 2022-01-28 11:09:47.06846 +0100 Change: 2022-01-28 11:10:31.436380533 +0100 Birth: - File: /data/cephfs-3/test2/x Size: 0 Blocks: 0 IO Block: 65536 directory Device: 2dh/45d Inode: 1099522341054 Links: 3 Access: (/d-) Uid: (0/root) Gid: (0/root) Access: 2022-01-28 11:10:31.462380399 +0100 Modify: 2022-01-28 11:09:49.258598614 +0100 Change: 2022-01-28 11:10:31.462380399 +0100 Birth: - File: /data/cephfs-3/test2/x/yz Size: 0 Blocks: 0 IO Block: 4194304 regular empty file Device: 2dh/45d Inode: 1099522341056 Links: 1 Access: (/--) Uid: (0/root) Gid: (0/root) Access: 2022-01-28 11:10:31.447380476 +0100 Modify: 2022-01-28 11:09:49.265598578 +0100 Change: 2022-01-28 11:10:31.447380476 +0100 Birth: - File: /data/cephfs-3/test2/x/y Size: 0 Blocks: 0 IO Block: 65536 directory Device: 2dh/45d Inode: 1099522341055 Links: 2 Access: (/d-) Uid: (0/root) Gid: (0/root) Access: 2022-01-28 11:10:31.439380518 +0100 Modify: 2022-01-28 11:09:47.669606830 +0100 Change: 2022-01-28 11:10:31.439380518 +0100 Birth: - On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar wrote: > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe wrote: > > > > Hi, > > > > thanks for the reply. > > > > Actually, mounting the source and remote fs on linux with kernel > > driver (Rocky Linux 8.5 default kernel), I can `rsync`. > > You are probably running rsync with --no-perms or a custom --chmod (or > one of --no-o, --no-g) I guess? > > > > > Is this to be expected? > > > > Cheers, > > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar wrote: > > > > > > Hey Manuel, > > > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe > > > wrote: > > > > > > > > OK, reconstructed with another example: > > > > > > > > -- source file system -- > > > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > > > > File: /data/cephfs-2/test/x2 > > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > > Device: 2ch/44d Inode: 1099840816759 Links: 3 > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > > Change: 2022-01-27 16:24:51.294599055 +0100 > > > > Birth: - > > > > File: /data/cephfs-2/test/x2/y2 > > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > > Device: 2ch/44d Inode: 1099840816760 Links: 2 > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-27 16:24:22.001750514 +0100 > > > > Modify: 2022-01-27 16:24:27.712720985 +0100 > > > > Change: 2022-01-27 16:24:51.307598988 +0100 > > > > Birth: - > > > > File: /data/cephfs-2/test/x2/y2/z > > > > Size: 0 Blocks: 0 IO Block: 4194304 regular > > > > empty file > > > > Device: 2ch/44d Inode: 1099840816761 Links: 1 > > > > Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-27 16:24:27.713720980 +0100 > > > > Modify: 2022-01-27 16:24:27.713720980 +0100 > > > > Change: 2022-01-27 16:24:27.713720980 +0100 > > > > Birth: - > > > > > > > > -- resulting remote file system -- > > > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat > > > > File: /data/cephfs-3/test/x2 > > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > > Device: 2dh/45d Inode: 1099521812568 Links: 2 > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > > Chan
[ceph-users] 'cephadm bootstrap' and 'ceph orch' creates daemons with latest / devel container images instead of stable images
Hi All, We are trying to deploy the ceph (12.6.7) cluster on production using cephadm. Unfortunately, we encountered the following situation. Description The cephadm(v16.2.7) bootstrap by default chooses container images quay.io/ceph/ceph:v16 and docker.io/ceph/daemon-base:latest-pacific-devel. Since we want to avoid using devel and latest container images in production, we pulled the required images (with static tags) prior to running bootstrap. Also, we mentioned the image name and --skip-pull parameter in bootstrap command. Still cephadm uses the image docker.io/ceph/daemon-base:latest-pacific-devel for some of the daemons and it is still pulling the image even though --skip-pull is mentioned. Due to this, daemons on different host's are running on different versions of container images. Hence, there is no provision to use a specific image instead of docker.io/ceph/daemon-base:latest-pacific-devel during bootstrap for consistency across all daemons in the cluster. Similary the same behaviour exists while creating daemons using ceph-orch. Command used to bootstrap a cluster(stable container images are already pulled in prior): sudo cephadm --image quay.io/ceph/ceph:v16.2.7 bootstrap --skip-monitoring-stack --mon-ip ... --cluster-network ... --ssh-user ceph_user --config /home/ceph_user/ceph_bootstrap/ceph.conf --initial-dashboard-password Q5446UBS3KK9 --dashboard-password-noupdate --no-minimize-config --skip-pull Below are some entries from cephadm.log, which clearly shows its trying to pull image even --skip-pull is provided: 2022-01-27 17:11:13,900 7f01b6621b80 INFO Deploying mon service with default placement... 2022-01-27 17:11:14,212 7f211cc85b80 DEBUG cephadm ['--image', 'docker.io/ceph/daemon-base:latest-pacific-devel', 'ls'] 2022-01-27 17:11:14,296 7f211cc85b80 DEBUG /bin/podman: 3.3.1 2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman: 4da6ea847240,24.26MB / 134.9GB 2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman: 52b12ff050d8,390.7MB / 134.9GB 2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman: 5c979c84d182,4.342MB / 134.9GB 2022-01-27 17:11:14,766 7f211cc85b80 DEBUG systemctl: enabled 2022-01-27 17:11:14,778 7f211cc85b80 DEBUG systemctl: active 2022-01-27 17:11:14,912 7f211cc85b80 DEBUG /bin/podman: 52b12ff050d88841131aa6508f7576a1dca8e0004db08384dd13dca6c2d3b725, quay.io/ceph/ceph:v16.2.7,cc266d6139f4d044d28ace2308f7befcdfead3c3e88bc3faed905298cae299ef,2022-01-27 17:10:33.135056074 +0530 IST, 2022-01-27 17:11:15,059 7f211cc85b80 DEBUG /bin/podman: [ quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 quay.io/ceph/ceph@sha256:bb6a71f7f481985f6d3b358e3b9ef64c6755b3db5aa53198e0aac38be5c8ae54 ] 2022-01-27 17:11:15,456 7f01b6621b80 DEBUG /usr/bin/ceph: Scheduled mon update... 2022-01-27 17:11:15,641 7f211cc85b80 DEBUG /bin/podman: ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable) 2022-01-27 17:11:15,972 7f01b6621b80 INFO Deploying mgr service with default placement... 2022-01-27 17:11:16,127 7f211cc85b80 DEBUG systemctl: enabled 2022-01-27 17:11:16,140 7f211cc85b80 DEBUG systemctl: active 2022-01-27 17:11:16,296 7f211cc85b80 DEBUG /bin/podman: 4da6ea847240bab786f596ddc87160e11056c74aa7004dc38ee12be331a5ea4e, quay.io/ceph/ceph:v16.2.7,cc266d6139f4d044d28ace2308f7befcdfead3c3e88bc3faed905298cae299ef,2022-01-27 17:10:25.830630277 +0530 IST, 2022-01-27 17:11:17,023 7f0b0c505b80 DEBUG cephadm ['--image', 'docker.io/ceph/daemon-base:latest-pacific-devel', 'ceph-volume', '--fsid', 'e3c9bff6-7f65-11ec-bdff-0015171590ba', '--', 'inventory', '--format=json-pretty', '--filter-for-batch'] 2022-01-27 17:11:17,102 7f0b0c505b80 DEBUG /bin/podman: 3.3.1 2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman: 4da6ea847240,24.71MB / 134.9GB 2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman: 52b12ff050d8,390.8MB / 134.9GB 2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman: d242f1fa7a66,28.28MB / 134.9GB 2022-01-27 17:11:17,417 7f0b0c505b80 INFO Inferring config /var/lib/ceph/e3c9bff6-7f65-11ec-bdff-0015171590ba/mon.hcictrl01/config 2022-01-27 17:11:17,417 7f0b0c505b80 DEBUG Using specified fsid: e3c9bff6-7f65-11ec-bdff-0015171590ba 2022-01-27 17:11:17,620 7f01b6621b80 DEBUG /usr/bin/ceph: Scheduled mgr update... 2022-01-27 17:11:17,727 7f0b0c505b80 DEBUG stat: Trying to pull docker.io/ceph/daemon-base:latest-pacific-devel... 2022-01-27 17:11:18,489 7f01b6621b80 INFO Deploying crash service with default placement... 2022-01-27 17:11:18,763 7f3ed21eeb80 DEBUG sestatus: SELinux status: disabled 2022-01-27 17:11:18,768 7f3ed21eeb80 DEBUG sestatus: SELinux status: disabled 2022-01-27 17:11:18,774 7f3ed21eeb80 DEBUG sestatus: SELinux status: disabled 2022-01-27 17:11:18,779 7f3ed21eeb80 DEBUG sestatus: SELinux status: disabled 2022-01-27 17:11:18,784 7f3ed21ee
[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager
Hey Venky, I would be happy if you create the issue. Under this link: https://www.filemail.com/d/skgyuyszdlgrkxw you can download the log file and also my description of the problem. The txt also includes the most interesting lines of the log. Cheers, Sebastian > On 28.01.2022, at 11:07, Venky Shankar wrote: > > On Fri, Jan 28, 2022 at 3:03 PM Sebastian Mazza wrote: >> >> Hey Venky, >> >> thank you very much for your response! >> >>> It would help if you could enable debug log for ceph-mgr, repeat the >>> steps you mention above and upload the log in the tracker. >> >> >> I have already collected log files after enabling the debug log by `ceph >> config set mgr mgr/snap_schedule/log_level debug`, and I would be happy to >> share it. >> >>> Could you please file a tracker here: >>> https://tracker.ceph.com/projects/cephfs/issues/new >> >> I signed up for an account, but need to wait for being approved by an >> administrator. > > Thanks. If you can share the logs, I can create the tracker in the meantime. > >> >> >> Cheers, >> Sebastian >> > > > -- > Cheers, > Venky > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe wrote: > > I'm running rsync "-Wa", see below for a reproduction from scratch > that actually syncs as root when no permissions are given on the > directories. > > -- full mount options -- > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/ > on /data/cephfs-2 type ceph > (rw,noatime,name=samba,secret=,acl) > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/ > on /data/cephfs-3 type ceph > (rw,noatime,name=gateway,secret=,rbytes,acl) > > -- example -- > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y > 0|0[root@gw-1 ~]# touch !$z > touch /data/cephfs-2/test2/x/yz > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2 > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2 > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/. > sending incremental file list > ./ > x/ > x/yz > x/y/ Try running this from a ceph-fuse mount - it would fail. It's probably related to the way how permission checks are done (we may want to fix that in the user-space driver). Since the mirror daemon uses the user-space library, it would be running into the same permission related constraints as ceph-fuse. > > sent 165 bytes received 50 bytes 430.00 bytes/sec > total size is 0 speedup is 0.00 > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat > File: /data/cephfs-3/test2 > Size: 0 Blocks: 0 IO Block: 65536 directory > Device: 2dh/45d Inode: 1099522341053 Links: 3 > Access: (/d-) Uid: (0/root) Gid: (0/root) > Access: 2022-01-28 11:10:31.436380533 +0100 > Modify: 2022-01-28 11:09:47.06846 +0100 > Change: 2022-01-28 11:10:31.436380533 +0100 > Birth: - > File: /data/cephfs-3/test2/x > Size: 0 Blocks: 0 IO Block: 65536 directory > Device: 2dh/45d Inode: 1099522341054 Links: 3 > Access: (/d-) Uid: (0/root) Gid: (0/root) > Access: 2022-01-28 11:10:31.462380399 +0100 > Modify: 2022-01-28 11:09:49.258598614 +0100 > Change: 2022-01-28 11:10:31.462380399 +0100 > Birth: - > File: /data/cephfs-3/test2/x/yz > Size: 0 Blocks: 0 IO Block: 4194304 regular empty > file > Device: 2dh/45d Inode: 1099522341056 Links: 1 > Access: (/--) Uid: (0/root) Gid: (0/root) > Access: 2022-01-28 11:10:31.447380476 +0100 > Modify: 2022-01-28 11:09:49.265598578 +0100 > Change: 2022-01-28 11:10:31.447380476 +0100 > Birth: - > File: /data/cephfs-3/test2/x/y > Size: 0 Blocks: 0 IO Block: 65536 directory > Device: 2dh/45d Inode: 1099522341055 Links: 2 > Access: (/d-) Uid: (0/root) Gid: (0/root) > Access: 2022-01-28 11:10:31.439380518 +0100 > Modify: 2022-01-28 11:09:47.669606830 +0100 > Change: 2022-01-28 11:10:31.439380518 +0100 > Birth: - > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar wrote: > > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe > > wrote: > > > > > > Hi, > > > > > > thanks for the reply. > > > > > > Actually, mounting the source and remote fs on linux with kernel > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`. > > > > You are probably running rsync with --no-perms or a custom --chmod (or > > one of --no-o, --no-g) I guess? > > > > > > > > Is this to be expected? > > > > > > Cheers, > > > > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar > > > wrote: > > > > > > > > Hey Manuel, > > > > > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe > > > > wrote: > > > > > > > > > > OK, reconstructed with another example: > > > > > > > > > > -- source file system -- > > > > > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > > > > > File: /data/cephfs-2/test/x2 > > > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > > > Device: 2ch/44d Inode: 1099840816759 Links: 3 > > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/ > > > > > root) > > > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > > > Change: 2022-01-27 16:24:51.294599055 +0100 > > > > > Birth: - > > > > > File: /data/cephfs-2/test/x2/y2 > > > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > > > Device: 2ch/44d Inode: 1099840816760 Links: 2 > > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/ > > > > > root) > > > > > Access: 2022-01-27 16:24:22.001750514 +0100 > > > > > Modify: 2022-01-27 16:24:27.712720985 +0100 > > > > > Change: 2022-01-27 16:24:51.307598988 +0100 > > > > > Birth: - > > > > > File: /data/cephfs-2/test/x2/y2/z > > > > > Size: 0 Blocks: 0 IO Block: 4194304 regular > > > > > empty file > > > > > Device: 2ch/44d Inode: 1099840816761 Links: 1 > > > > > Access: (0644/-rw-r--r--) Uid: (0/root) Gid: (0/ > > > > > root) > > > > > Access: 2022-01-2
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
OK, so there is a different in semantics of the kernel and the user space driver? Which one would you consider to be desired? >From what I can see, the kernel semantics (apparently: root can do everything) would allow to sync between file systems no matter what. With the current user space semantics, users could `chmod a=` folders in their $HOME and stop the sync from working. Is my interpretation correct? Best wishes, Manuel On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar wrote: > > On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe wrote: > > > > I'm running rsync "-Wa", see below for a reproduction from scratch > > that actually syncs as root when no permissions are given on the > > directories. > > > > -- full mount options -- > > > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/ > > on /data/cephfs-2 type ceph > > (rw,noatime,name=samba,secret=,acl) > > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/ > > on /data/cephfs-3 type ceph > > (rw,noatime,name=gateway,secret=,rbytes,acl) > > > > -- example -- > > > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y > > 0|0[root@gw-1 ~]# touch !$z > > touch /data/cephfs-2/test2/x/yz > > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2 > > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2 > > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/. > > sending incremental file list > > ./ > > x/ > > x/yz > > x/y/ > > Try running this from a ceph-fuse mount - it would fail. It's probably > related to the way how permission checks are done (we may want to fix > that in the user-space driver). > > Since the mirror daemon uses the user-space library, it would be > running into the same permission related constraints as ceph-fuse. > > > > > sent 165 bytes received 50 bytes 430.00 bytes/sec > > total size is 0 speedup is 0.00 > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat > > File: /data/cephfs-3/test2 > > Size: 0 Blocks: 0 IO Block: 65536 directory > > Device: 2dh/45d Inode: 1099522341053 Links: 3 > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-28 11:10:31.436380533 +0100 > > Modify: 2022-01-28 11:09:47.06846 +0100 > > Change: 2022-01-28 11:10:31.436380533 +0100 > > Birth: - > > File: /data/cephfs-3/test2/x > > Size: 0 Blocks: 0 IO Block: 65536 directory > > Device: 2dh/45d Inode: 1099522341054 Links: 3 > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-28 11:10:31.462380399 +0100 > > Modify: 2022-01-28 11:09:49.258598614 +0100 > > Change: 2022-01-28 11:10:31.462380399 +0100 > > Birth: - > > File: /data/cephfs-3/test2/x/yz > > Size: 0 Blocks: 0 IO Block: 4194304 regular empty > > file > > Device: 2dh/45d Inode: 1099522341056 Links: 1 > > Access: (/--) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-28 11:10:31.447380476 +0100 > > Modify: 2022-01-28 11:09:49.265598578 +0100 > > Change: 2022-01-28 11:10:31.447380476 +0100 > > Birth: - > > File: /data/cephfs-3/test2/x/y > > Size: 0 Blocks: 0 IO Block: 65536 directory > > Device: 2dh/45d Inode: 1099522341055 Links: 2 > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > Access: 2022-01-28 11:10:31.439380518 +0100 > > Modify: 2022-01-28 11:09:47.669606830 +0100 > > Change: 2022-01-28 11:10:31.439380518 +0100 > > Birth: - > > > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar wrote: > > > > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe > > > wrote: > > > > > > > > Hi, > > > > > > > > thanks for the reply. > > > > > > > > Actually, mounting the source and remote fs on linux with kernel > > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`. > > > > > > You are probably running rsync with --no-perms or a custom --chmod (or > > > one of --no-o, --no-g) I guess? > > > > > > > > > > > Is this to be expected? > > > > > > > > Cheers, > > > > > > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar > > > > wrote: > > > > > > > > > > Hey Manuel, > > > > > > > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe > > > > > wrote: > > > > > > > > > > > > OK, reconstructed with another example: > > > > > > > > > > > > -- source file system -- > > > > > > > > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat > > > > > > File: /data/cephfs-2/test/x2 > > > > > > Size: 1 Blocks: 0 IO Block: 65536 directory > > > > > > Device: 2ch/44d Inode: 1099840816759 Links: 3 > > > > > > Access: (2440/dr--r-S---) Uid: (0/root) Gid: (0/ > > > > > > root) > > > > > > Access: 2022-01-27 16:24:15.627783470 +0100 > > > > > > Modify: 2022-01-27 16:24:22.001750514 +0100 > > > > > > Change: 2022-01-27 16:24:51.294599055 +0100 > > > > > > Birth: - > > > > > > File: /data/cephfs-2/test/x2/y2 > > > > > >
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
On Fri, Jan 28, 2022 at 4:22 PM Manuel Holtgrewe wrote: > > OK, so there is a different in semantics of the kernel and the user > space driver? Right. > > Which one would you consider to be desired? The kernel driver is probably doing the right thing. > > From what I can see, the kernel semantics (apparently: root can do > everything) would allow to sync between file systems no matter what. > With the current user space semantics, users could `chmod a=` folders > in their $HOME and stop the sync from working. Is my interpretation > correct? Correct. I haven't root caused the issue with the user space driver yet. This blocks using the cephfs-mirror daemon with read-only source directories. I'll file a tracker for this. Thanks for your help. > > Best wishes, > Manuel > > On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar wrote: > > > > On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe > > wrote: > > > > > > I'm running rsync "-Wa", see below for a reproduction from scratch > > > that actually syncs as root when no permissions are given on the > > > directories. > > > > > > -- full mount options -- > > > > > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/ > > > on /data/cephfs-2 type ceph > > > (rw,noatime,name=samba,secret=,acl) > > > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/ > > > on /data/cephfs-3 type ceph > > > (rw,noatime,name=gateway,secret=,rbytes,acl) > > > > > > -- example -- > > > > > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y > > > 0|0[root@gw-1 ~]# touch !$z > > > touch /data/cephfs-2/test2/x/yz > > > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2 > > > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2 > > > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/. > > > sending incremental file list > > > ./ > > > x/ > > > x/yz > > > x/y/ > > > > Try running this from a ceph-fuse mount - it would fail. It's probably > > related to the way how permission checks are done (we may want to fix > > that in the user-space driver). > > > > Since the mirror daemon uses the user-space library, it would be > > running into the same permission related constraints as ceph-fuse. > > > > > > > > sent 165 bytes received 50 bytes 430.00 bytes/sec > > > total size is 0 speedup is 0.00 > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat > > > File: /data/cephfs-3/test2 > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > Device: 2dh/45d Inode: 1099522341053 Links: 3 > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-28 11:10:31.436380533 +0100 > > > Modify: 2022-01-28 11:09:47.06846 +0100 > > > Change: 2022-01-28 11:10:31.436380533 +0100 > > > Birth: - > > > File: /data/cephfs-3/test2/x > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > Device: 2dh/45d Inode: 1099522341054 Links: 3 > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-28 11:10:31.462380399 +0100 > > > Modify: 2022-01-28 11:09:49.258598614 +0100 > > > Change: 2022-01-28 11:10:31.462380399 +0100 > > > Birth: - > > > File: /data/cephfs-3/test2/x/yz > > > Size: 0 Blocks: 0 IO Block: 4194304 regular > > > empty file > > > Device: 2dh/45d Inode: 1099522341056 Links: 1 > > > Access: (/--) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-28 11:10:31.447380476 +0100 > > > Modify: 2022-01-28 11:09:49.265598578 +0100 > > > Change: 2022-01-28 11:10:31.447380476 +0100 > > > Birth: - > > > File: /data/cephfs-3/test2/x/y > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > Device: 2dh/45d Inode: 1099522341055 Links: 2 > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > Access: 2022-01-28 11:10:31.439380518 +0100 > > > Modify: 2022-01-28 11:09:47.669606830 +0100 > > > Change: 2022-01-28 11:10:31.439380518 +0100 > > > Birth: - > > > > > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar > > > wrote: > > > > > > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > thanks for the reply. > > > > > > > > > > Actually, mounting the source and remote fs on linux with kernel > > > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`. > > > > > > > > You are probably running rsync with --no-perms or a custom --chmod (or > > > > one of --no-o, --no-g) I guess? > > > > > > > > > > > > > > Is this to be expected? > > > > > > > > > > Cheers, > > > > > > > > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar > > > > > wrote: > > > > > > > > > > > > Hey Manuel, > > > > > > > > > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe > > > > > > wrote: > > > > > > > > > > > > > > OK, reconstructed with another example: > > > > > > > > > > > > > > -- source file system -- > > > > > > > > > > > > > > 0|
[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?
Great. No, thank *you* for such excellent software! On Fri, Jan 28, 2022 at 1:20 PM Venky Shankar wrote: > > On Fri, Jan 28, 2022 at 4:22 PM Manuel Holtgrewe wrote: > > > > OK, so there is a different in semantics of the kernel and the user > > space driver? > > Right. > > > > > Which one would you consider to be desired? > > The kernel driver is probably doing the right thing. > > > > > From what I can see, the kernel semantics (apparently: root can do > > everything) would allow to sync between file systems no matter what. > > With the current user space semantics, users could `chmod a=` folders > > in their $HOME and stop the sync from working. Is my interpretation > > correct? > > Correct. > > I haven't root caused the issue with the user space driver yet. This > blocks using the cephfs-mirror daemon with read-only source > directories. > > I'll file a tracker for this. Thanks for your help. > > > > > Best wishes, > > Manuel > > > > On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar wrote: > > > > > > On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe > > > wrote: > > > > > > > > I'm running rsync "-Wa", see below for a reproduction from scratch > > > > that actually syncs as root when no permissions are given on the > > > > directories. > > > > > > > > -- full mount options -- > > > > > > > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/ > > > > on /data/cephfs-2 type ceph > > > > (rw,noatime,name=samba,secret=,acl) > > > > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/ > > > > on /data/cephfs-3 type ceph > > > > (rw,noatime,name=gateway,secret=,rbytes,acl) > > > > > > > > -- example -- > > > > > > > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y > > > > 0|0[root@gw-1 ~]# touch !$z > > > > touch /data/cephfs-2/test2/x/yz > > > > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2 > > > > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2 > > > > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. > > > > /data/cephfs-3/test2/. > > > > sending incremental file list > > > > ./ > > > > x/ > > > > x/yz > > > > x/y/ > > > > > > Try running this from a ceph-fuse mount - it would fail. It's probably > > > related to the way how permission checks are done (we may want to fix > > > that in the user-space driver). > > > > > > Since the mirror daemon uses the user-space library, it would be > > > running into the same permission related constraints as ceph-fuse. > > > > > > > > > > > sent 165 bytes received 50 bytes 430.00 bytes/sec > > > > total size is 0 speedup is 0.00 > > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat > > > > File: /data/cephfs-3/test2 > > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > > Device: 2dh/45d Inode: 1099522341053 Links: 3 > > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-28 11:10:31.436380533 +0100 > > > > Modify: 2022-01-28 11:09:47.06846 +0100 > > > > Change: 2022-01-28 11:10:31.436380533 +0100 > > > > Birth: - > > > > File: /data/cephfs-3/test2/x > > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > > Device: 2dh/45d Inode: 1099522341054 Links: 3 > > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-28 11:10:31.462380399 +0100 > > > > Modify: 2022-01-28 11:09:49.258598614 +0100 > > > > Change: 2022-01-28 11:10:31.462380399 +0100 > > > > Birth: - > > > > File: /data/cephfs-3/test2/x/yz > > > > Size: 0 Blocks: 0 IO Block: 4194304 regular > > > > empty file > > > > Device: 2dh/45d Inode: 1099522341056 Links: 1 > > > > Access: (/--) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-28 11:10:31.447380476 +0100 > > > > Modify: 2022-01-28 11:09:49.265598578 +0100 > > > > Change: 2022-01-28 11:10:31.447380476 +0100 > > > > Birth: - > > > > File: /data/cephfs-3/test2/x/y > > > > Size: 0 Blocks: 0 IO Block: 65536 directory > > > > Device: 2dh/45d Inode: 1099522341055 Links: 2 > > > > Access: (/d-) Uid: (0/root) Gid: (0/root) > > > > Access: 2022-01-28 11:10:31.439380518 +0100 > > > > Modify: 2022-01-28 11:09:47.669606830 +0100 > > > > Change: 2022-01-28 11:10:31.439380518 +0100 > > > > Birth: - > > > > > > > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar > > > > wrote: > > > > > > > > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe > > > > > wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > thanks for the reply. > > > > > > > > > > > > Actually, mounting the source and remote fs on linux with kernel > > > > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`. > > > > > > > > > > You are probably running rsync with --no-perms or a custom --chmod (or > > > > > one of --no-o, --no-g) I guess? > > > > > > > > > > > > > > > > > Is this to be expected? > > > > > > > > > > > > Cheers,
[ceph-users] Support for additional bind-mounts to specific container types
Hey folks - We’ve been using a hack to get bind mounts into our manager containers for various reasons. We’ve realized that this quickly breaks down when our “hacks” don’t exist inside “cephadm” in the manager container and we execute a “ceph orch upgrade”. Is there an official way to add a bind mount to a manager container? Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however we use a certificate to encrypt monitoring traffic that we need the ability to rotate. If the certificate is mapped in via a bind mount it can much more easily be rotated in the event it is ever comprised. This same use case is used for other custom code we have running as a manager plugin. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Support for additional bind-mounts to specific container types
> > Hey folks - We’ve been using a hack to get bind mounts into our manager > containers for various reasons. We’ve realized that this quickly breaks > down when our “hacks” don’t exist inside “cephadm” in the manager > container and we execute a “ceph orch upgrade”. Is there an official way > to add a bind mount to a manager container? I am not really an expert on the use of cephadm or containers but. Are these things not wrong in your 'hack' thinking. 1. that would imply that you always have to run this as eeehhh root? 2. afaik is best practice that your oc supplies volumes to your container. > Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however > we use a certificate to encrypt monitoring traffic that we need the > ability to rotate. Generate long term certificates from your own ca. OT: stop hacking ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Support for additional bind-mounts to specific container types
Point 1 (Why are we running as root?): All Ceph containers are instantiated as root (Privileged - for "reasons") but daemons inside the container run a user 167 ("ceph" user). I don't understand your second point, if you're saying that the "container" is what specifies mount points that's incorrect. It's the "docker run" instantiation of the container that specifies what mount points are passed to the container and that is controlled by "cephadm" today. The length of validity of a mutual TLS certificate means nothing if a hacker compromises the key. On 1/28/22, 8:35 AM, "Marc" wrote: > > Hey folks - We’ve been using a hack to get bind mounts into our manager > containers for various reasons. We’ve realized that this quickly breaks > down when our “hacks” don’t exist inside “cephadm” in the manager > container and we execute a “ceph orch upgrade”. Is there an official way > to add a bind mount to a manager container? I am not really an expert on the use of cephadm or containers but. Are these things not wrong in your 'hack' thinking. 1. that would imply that you always have to run this as eeehhh root? 2. afaik is best practice that your oc supplies volumes to your container. > Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however > we use a certificate to encrypt monitoring traffic that we need the > ability to rotate. Generate long term certificates from your own ca. OT: stop hacking ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption
On 1/26/2022 1:18 AM, Sebastian Mazza wrote: Hey Igor, thank you for your response! Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io for productive clusters? Generally upstream recommendation is to disable disk write caching, there were multiple complains it might negatively impact the performance in some setups. As for bluefs_buffered_io - please keep it on, the disablmement is known to cause performance drop. Thanks for the explanation. For the enabled disk write cache you only mentioned possible performance problem, but can the enabled disk write cache also lead to data corruption? Or make a problem more likely than with a disabled disk cache? Definitely it can, particularly if cache isn't protected from power loss or the implementation isn't so good ;) When rebooting a node - did you perform it by regular OS command (reboot or poweroff) or by a power switch? I never did a hard reset or used the power switch. I used `init 6` for performing a reboot. Each server has redundant power supplies with one connected to a battery backup and the other to the grid. Therefore, I do think that none of the servers ever faced a non clean shutdown or reboot. So the original reboot which caused the failures was made in the same manner, right? Yes, Exactly. And the OSD logs confirms that: OSD 4: 2021-12-12T21:33:07.780+0100 7f464a944700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal Terminated *** 2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown (osd_fast_shutdown=true) *** 2021-12-12T21:35:29.918+0100 7ffa5ce42f00 0 set uid:gid to 64045:64045 (ceph:ceph) 2021-12-12T21:35:29.918+0100 7ffa5ce42f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1608 :... 2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002145.sst 2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db: OSD 7: 2021-12-12T21:20:11.141+0100 7f9714894700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal Terminated *** 2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown (osd_fast_shutdown=true) *** 2021-12-12T21:21:41.881+0100 7f63c6557f00 0 set uid:gid to 64045:64045 (ceph:ceph) 2021-12-12T21:21:41.881+0100 7f63c6557f00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1937 :... 2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst 2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: OSD 8: 2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal Terminated *** 2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown (osd_fast_shutdown=true) *** 2021-12-12T21:21:41.881+0100 7f6d18d2bf00 0 set uid:gid to 64045:64045 (ceph:ceph) 2021-12-12T21:21:41.881+0100 7f6d18d2bf00 0 ceph version 16.2.6 (1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, pid 1938 :... 2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst 2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db: Best regards, Sebastian -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Removed daemons listed as stray
Hello Vlad, Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning is specifically designed to point out daemons in the cluster that cephadm is not aware of/in control of. It does this by comparing the daemons it has cached info on (this cached info is what you see in "ceph orch ps") with the return value of a core mgr function designed to list the servers in the cluster and what daemons are on them. This function, from cephadm's point of view, is a bit of a black box (by design, as it is meant to find daemons cephadm is not aware of/in control of). If you'd like to see a rough estimate of what that looks like I'd check the output of "ceph node ls" (you may see your non-existent osds listed there). This means, a daemon that does not exist that cephadm is falsely reporting as a stray daemon cannot typically be resolved through "ceph orch . . ." commands. In the past I've found sometimes just doing a mgr failover ("ceph mgr fail") will clear this in the case of false reports so that's what I'd try first. If that doesn't, I'd maybe try checking if the osd is till listed in the crush map and if so, remove it (first step in https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd I think). It's possible that the reason the daemon rm commands hung is one of the cleanup operations cephadm was trying to run under the hood when removing the osd hung and so the osd is still believed to be present by the cluster. - Adam On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik < vladimir.b...@icecube.wisc.edu> wrote: > Hello > > I needed to permanently remove two drives from my pool so I > ran "ceph orch daemon rm XXX". The command hung for both > OSDs, but the daemons were removed. I then purged the two OSDs. > > Now ceph status is complaining about them with > CEPHADM_STRAY_DAEMON, but the daemons aren't running and are > not showing up in ceph orch ps. If I try to "daemon rm" > again I get Error EINVAL: Unable to find daemon(s). > > Anybody have an idea about what could have happened or how > to stop ceph status from listing the non-existing daemons as > stray? > > > Thanks, > > Vlad > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephadm trouble
Hmm, I'm not seeing anything that could be a cause in any of that output. I did notice, however, from your "ceph orch ls" output that none of your services have been refreshed since the 24th. Cephadm typically tries to refresh these things every 10 minutes so that signals something is quite wrong. Could you try running "ceph mgr fail" and if nothing seems to be resolved could you post "ceph log last 200 debug cephadm". Maybe we can see if something gets stuck again after the mgr restarts. Thanks, - Adam King On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov wrote: > Hi! > > I think this happened after I tried to recreate the osd with the command > "ceph orch daemon add osd s-8-2-1:/dev/bcache0" > > > > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container > image > > for some daemon. Can you provide the output of "ceph orch ls --format > > yaml", > > https://pastebin.com/CStBf4J0 > > > "ceph orch upgrade status", > root@s-26-9-19-mon-m1:~# ceph orch upgrade status > { > "target_image": null, > "in_progress": false, > "services_complete": [], > "progress": null, > "message": "" > } > > > > "ceph config get mgr container_image", > root@s-26-9-19-mon-m1:~# ceph config get mgr container_image > > quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 > > > > and the values for monitoring stack container images (format is "ceph > > config get mgr mgr/cephadm/container_image_" where daemon > type > > is one of "prometheus", "node_exporter", "alertmanager", "grafana", > > "haproxy", "keepalived"). > quay.io/prometheus/prometheus:v2.18.1 > quay.io/prometheus/node-exporter:v0.18.1 > quay.io/prometheus/alertmanager:v0.20.0 > quay.io/ceph/ceph-grafana:6.7.4 > docker.io/library/haproxy:2.3 > docker.io/arcts/keepalived > > > > > Thanks, > > > > - Adam King > > Thanks a lot! > > WBR, > Fyodor. > > > > > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov wrote: > > > >> Hi! > >> > >> I rebooted the nodes with mgr and now I see the following in the > >> cephadm.log: > >> > >> As I understand it - cephadm is trying to execute some unsuccessful > >> command of mine (I wonder which one), it does not succeed, but it keeps > >> trying and trying. How do I stop it from trying? > >> > >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG > >> > > >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] > >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 > >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image > >> s-8-2-1:/dev/bcache0... > >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: > invalid > >> reference format > >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from > >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error: > >> invalid reference format > >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command: > >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG > >> > > >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] > >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1 > >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image > >> s-8-2-1:/dev/bcache0... > >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error: > invalid > >> reference format > >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from > >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error: > >> invalid reference format > >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command: > >> /usr/bin/podman pull s-8-2-1:/dev/bcache0 > >> > >> WBR, > >> Fyodor. > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > >> > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Reinstalling OSD node managed by cephadm
Dear all, Recently, there were some very specific questions regarding reinstalling an OSD node while keeping the disks intact. The discussion went around corner cases. I think that I have a very easy case - vanilla cluster setup with ansible playbooks - adopted by cephadm - latest pacific 16.2.7 What is the overall process of reinstalling (e.g., for going from enterprise linux 7 to 8) and getting my OSDs back afterwards. - reinstall operating system on system disk - install cephadm binary - ... now what? ;-) Best wishes, Manuel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephadm trouble
Hi! > Hmm, I'm not seeing anything that could be a cause in any of that output. I > did notice, however, from your "ceph orch ls" output that none of your > services have been refreshed since the 24th. Cephadm typically tries to > refresh these things every 10 minutes so that signals something is quite > wrong. >From what I see in /var/log/ceph/cephadm.log it tries to run the same command >once a minute and does nothing else. That's why the status has not been >updated for 5 days. > Could you try running "ceph mgr fail" and if nothing seems to be > resolved could you post "ceph log last 200 debug cephadm". Maybe we can see > if something gets stuck again after the mgr restarts. "ceph mgr fail" did not help. "ceph log last 200 debug cephadm" show again and again and again: 2022-01-28T20:57:12.792090+ mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 2022-01-28T20:58:13.092996+ mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392 : cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection yield (conn, connr) File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm code, '\n'.join(err))) orchestrator._interface.OrchestratorError: cephadm exited with an error code: 1, stderr:Pulling container image s-8-2-1:/dev/bcache0... Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0 /usr/bin/podman: stderr Error: invalid reference format ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0 > > Thanks, > > - Adam King > > On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov wrote: > >> Hi! >> >> I think this happened after I tried to recreate the osd with the command >> "ceph orch daemon add osd s-8-2-1:/dev/bcache0" >> >> >> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container >> image >> > for some daemon. Can you provide the output of "ceph orch ls --format >> > yaml", >> >> https://pastebin.com/CStBf4J0 >> >> > "ceph orch upgrade status", >> root@s-26-9-19-mon-m1:~# ceph orch upgrade status >> { >> "target_image": null, >> "in_progress": false, >> "services_complete": [], >> "progress": null, >> "message": "" >> } >> >> >> > "ceph config get mgr container_image", >> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image >> >> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728 >> >> >> > and the values for monitoring stack container images (format is "ceph >> > config get mgr mgr/cephadm/container_image_" where daemon >> type >> > is one of "prometheus", "node_exporter", "alertmanager", "grafana", >> > "haproxy", "keepalived"). >> quay.io/prometheus/prometheus:v2.18.1 >> quay.io/prometheus/node-exporter:v0.18.1 >> quay.io/prometheus/alertmanager:v0.20.0 >> quay.io/ceph/ceph-grafana:6.7.4 >> docker.io/library/haproxy:2.3 >> docker.io/arcts/keepalived >> >> > >> > Thanks, >> > >> > - Adam King >> >> Thanks a lot! >> >> WBR, >> Fyodor. >> >> > >> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov wrote: >> > >> >> Hi! >> >> >> >> I rebooted the nodes with mgr and now I see the following in the >> >> cephadm.log: >> >> >> >> As I understand it - cephadm is trying to execute some unsuccessful >> >> command of mine (I wonder which one), it does not succeed, but it keeps >> >> trying and trying. How do I stop it from trying? >> >> >> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG >> >> >> >> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull'] >> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1 >> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image >> >> s-8-2-1:/dev/bcache0... >> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error: >> invali
[ceph-users] Re: Removed daemons listed as stray
I had a situation like this, and the only operation that solved was a full reboot of the cluster (it was due the a watchdog alarm), but when the cluster return, the stray osds were gone. On Fri, 28 Jan 2022, 19:32 Adam King, wrote: > Hello Vlad, > > Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning > is specifically designed to point out daemons in the cluster that cephadm > is not aware of/in control of. It does this by comparing the daemons it has > cached info on (this cached info is what you see in "ceph orch ps") with > the return value of a core mgr function designed to list the servers in the > cluster and what daemons are on them. This function, from cephadm's point > of view, is a bit of a black box (by design, as it is meant to find > daemons cephadm is not aware of/in control of). If you'd like to see a > rough estimate of what that looks like I'd check the output of "ceph node > ls" (you may see your non-existent osds listed there). This means, a daemon > that does not exist that cephadm is falsely reporting as a stray daemon > cannot typically be resolved through "ceph orch . . ." commands. In the > past I've found sometimes just doing a mgr failover ("ceph mgr fail") will > clear this in the case of false reports so that's what I'd try first. If > that doesn't, I'd maybe try checking if the osd is till listed in the crush > map and if so, remove it (first step in > > https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd > I think). It's possible that the reason the daemon rm commands hung is one > of the cleanup operations cephadm was trying to run under the hood when > removing the osd hung and so the osd is still believed to be present by the > cluster. > > - Adam > > On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik < > vladimir.b...@icecube.wisc.edu> wrote: > > > Hello > > > > I needed to permanently remove two drives from my pool so I > > ran "ceph orch daemon rm XXX". The command hung for both > > OSDs, but the daemons were removed. I then purged the two OSDs. > > > > Now ceph status is complaining about them with > > CEPHADM_STRAY_DAEMON, but the daemons aren't running and are > > not showing up in ceph orch ps. If I try to "daemon rm" > > again I get Error EINVAL: Unable to find daemon(s). > > > > Anybody have an idea about what could have happened or how > > to stop ceph status from listing the non-existing daemons as > > stray? > > > > > > Thanks, > > > > Vlad > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io