[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager

2022-01-28 Thread Sebastian Mazza
Hey Venky,

thank you very much for your response!

> It would help if you could enable debug log for ceph-mgr, repeat the
> steps you mention above and upload the log in the tracker.


I have already collected log files after enabling the debug log by `ceph config 
set mgr mgr/snap_schedule/log_level debug`, and I would be happy to share it.

> Could you please file a tracker here:
> https://tracker.ceph.com/projects/cephfs/issues/new

I signed up for an account, but need to wait for being approved by an 
administrator.


Cheers,
Sebastian

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Venky Shankar
Hey Manuel,

On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe  wrote:
>
> OK, reconstructed with another example:
>
> -- source file system --
>
> 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
>  File: /data/cephfs-2/test/x2
>  Size: 1   Blocks: 0  IO Block: 65536  directory
> Device: 2ch/44d Inode: 1099840816759  Links: 3
> Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-27 16:24:15.627783470 +0100
> Modify: 2022-01-27 16:24:22.001750514 +0100
> Change: 2022-01-27 16:24:51.294599055 +0100
> Birth: -
>  File: /data/cephfs-2/test/x2/y2
>  Size: 1   Blocks: 0  IO Block: 65536  directory
> Device: 2ch/44d Inode: 1099840816760  Links: 2
> Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-27 16:24:22.001750514 +0100
> Modify: 2022-01-27 16:24:27.712720985 +0100
> Change: 2022-01-27 16:24:51.307598988 +0100
> Birth: -
>  File: /data/cephfs-2/test/x2/y2/z
>  Size: 0   Blocks: 0  IO Block: 4194304 regular empty file
> Device: 2ch/44d Inode: 1099840816761  Links: 1
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-27 16:24:27.713720980 +0100
> Modify: 2022-01-27 16:24:27.713720980 +0100
> Change: 2022-01-27 16:24:27.713720980 +0100
> Birth: -
>
> -- resulting remote file system --
>
> 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat
>  File: /data/cephfs-3/test/x2
>  Size: 0   Blocks: 0  IO Block: 65536  directory
> Device: 2dh/45d Inode: 1099521812568  Links: 2
> Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-27 16:24:15.627783470 +0100
> Modify: 2022-01-27 16:24:22.001750514 +0100
> Change: 2022-01-27 16:25:53.638392179 +0100
> Birth: -

The mirror daemon requires write access to a directory to update
entries (it uses libcephfs with uid/gid 0:0). The mode/ownership
changes are applied after creating the entry on the other cluster.

There's probably no "quick" workarounds for this, I'm afraid.

>
> -- log excerpt --
>
> debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> register_directory: dir_root=/test
> debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> try_lock_directory: dir_root=/test
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> try_lock_directory: dir_root=/test locked
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700  5
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> register_directory: dir_root=/test registered with
> replayer=0x56173a70a680
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> sync_snaps: dir_root=/test
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> do_sync_snaps: dir_root=/test
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0
> debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=.
> debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=..
> debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=initial
> debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=second
> debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: local snap_map={1384=initial,1385=second}
> debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=1
> debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=.
> debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=..
> debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: entry=initial
> debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 20
> cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> build_snap_map: snap_path=/test/.snap/initial,
> metadata={primary_snap_id=1384}
> debug 2022-01-27T15:25:42.480+ 7fe0ffbf0700 10
> cephfs::mirror::

[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Manuel Holtgrewe
Hi,

thanks for the reply.

Actually, mounting the source and remote fs on linux with kernel
driver (Rocky Linux 8.5 default kernel), I can `rsync`.

Is this to be expected?

Cheers,

On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  wrote:
>
> Hey Manuel,
>
> On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe  wrote:
> >
> > OK, reconstructed with another example:
> >
> > -- source file system --
> >
> > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
> >  File: /data/cephfs-2/test/x2
> >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > Device: 2ch/44d Inode: 1099840816759  Links: 3
> > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-27 16:24:15.627783470 +0100
> > Modify: 2022-01-27 16:24:22.001750514 +0100
> > Change: 2022-01-27 16:24:51.294599055 +0100
> > Birth: -
> >  File: /data/cephfs-2/test/x2/y2
> >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > Device: 2ch/44d Inode: 1099840816760  Links: 2
> > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-27 16:24:22.001750514 +0100
> > Modify: 2022-01-27 16:24:27.712720985 +0100
> > Change: 2022-01-27 16:24:51.307598988 +0100
> > Birth: -
> >  File: /data/cephfs-2/test/x2/y2/z
> >  Size: 0   Blocks: 0  IO Block: 4194304 regular empty 
> > file
> > Device: 2ch/44d Inode: 1099840816761  Links: 1
> > Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-27 16:24:27.713720980 +0100
> > Modify: 2022-01-27 16:24:27.713720980 +0100
> > Change: 2022-01-27 16:24:27.713720980 +0100
> > Birth: -
> >
> > -- resulting remote file system --
> >
> > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat
> >  File: /data/cephfs-3/test/x2
> >  Size: 0   Blocks: 0  IO Block: 65536  directory
> > Device: 2dh/45d Inode: 1099521812568  Links: 2
> > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-27 16:24:15.627783470 +0100
> > Modify: 2022-01-27 16:24:22.001750514 +0100
> > Change: 2022-01-27 16:25:53.638392179 +0100
> > Birth: -
>
> The mirror daemon requires write access to a directory to update
> entries (it uses libcephfs with uid/gid 0:0). The mode/ownership
> changes are applied after creating the entry on the other cluster.
>
> There's probably no "quick" workarounds for this, I'm afraid.
>
> >
> > -- log excerpt --
> >
> > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > register_directory: dir_root=/test
> > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > try_lock_directory: dir_root=/test
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > try_lock_directory: dir_root=/test locked
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700  5
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > register_directory: dir_root=/test registered with
> > replayer=0x56173a70a680
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > sync_snaps: dir_root=/test
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > do_sync_snaps: dir_root=/test
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0
> > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: entry=.
> > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: entry=..
> > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: entry=initial
> > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: entry=second
> > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: local snap_map={1384=initial,1385=second}
> > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=1
> > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_map: entry=.
> > debug 2022-01-27T15:25:42.479+ 7fe0ffbf0700 20
> > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > build_snap_

[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Venky Shankar
On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe  wrote:
>
> Hi,
>
> thanks for the reply.
>
> Actually, mounting the source and remote fs on linux with kernel
> driver (Rocky Linux 8.5 default kernel), I can `rsync`.

You are probably running rsync with --no-perms or a custom --chmod (or
one of --no-o, --no-g) I guess?

>
> Is this to be expected?
>
> Cheers,
>
> On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  wrote:
> >
> > Hey Manuel,
> >
> > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe  
> > wrote:
> > >
> > > OK, reconstructed with another example:
> > >
> > > -- source file system --
> > >
> > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
> > >  File: /data/cephfs-2/test/x2
> > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > Device: 2ch/44d Inode: 1099840816759  Links: 3
> > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > Change: 2022-01-27 16:24:51.294599055 +0100
> > > Birth: -
> > >  File: /data/cephfs-2/test/x2/y2
> > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > Device: 2ch/44d Inode: 1099840816760  Links: 2
> > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-27 16:24:22.001750514 +0100
> > > Modify: 2022-01-27 16:24:27.712720985 +0100
> > > Change: 2022-01-27 16:24:51.307598988 +0100
> > > Birth: -
> > >  File: /data/cephfs-2/test/x2/y2/z
> > >  Size: 0   Blocks: 0  IO Block: 4194304 regular empty 
> > > file
> > > Device: 2ch/44d Inode: 1099840816761  Links: 1
> > > Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-27 16:24:27.713720980 +0100
> > > Modify: 2022-01-27 16:24:27.713720980 +0100
> > > Change: 2022-01-27 16:24:27.713720980 +0100
> > > Birth: -
> > >
> > > -- resulting remote file system --
> > >
> > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat
> > >  File: /data/cephfs-3/test/x2
> > >  Size: 0   Blocks: 0  IO Block: 65536  directory
> > > Device: 2dh/45d Inode: 1099521812568  Links: 2
> > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > Change: 2022-01-27 16:25:53.638392179 +0100
> > > Birth: -
> >
> > The mirror daemon requires write access to a directory to update
> > entries (it uses libcephfs with uid/gid 0:0). The mode/ownership
> > changes are applied after creating the entry on the other cluster.
> >
> > There's probably no "quick" workarounds for this, I'm afraid.
> >
> > >
> > > -- log excerpt --
> > >
> > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > register_directory: dir_root=/test
> > > debug 2022-01-27T15:25:42.476+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > try_lock_directory: dir_root=/test
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 10
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > try_lock_directory: dir_root=/test locked
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700  5
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > register_directory: dir_root=/test registered with
> > > replayer=0x56173a70a680
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > sync_snaps: dir_root=/test
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > do_sync_snaps: dir_root=/test
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: dir_root=/test, snap_dir=/test/.snap, is_remote=0
> > > debug 2022-01-27T15:25:42.477+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: entry=.
> > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: entry=..
> > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: entry=initial
> > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: entry=second
> > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 10
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1-a41df7b58955)
> > > build_snap_map: local snap_map={1384=initial,1385=second}
> > > debug 2022-01-27T15:25:42.478+ 7fe0ffbf0700 20
> > > cephfs::mirror::PeerReplayer(f477cfed-6270-4beb-aaa1

[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager

2022-01-28 Thread Venky Shankar
On Fri, Jan 28, 2022 at 3:03 PM Sebastian Mazza  wrote:
>
> Hey Venky,
>
> thank you very much for your response!
>
> > It would help if you could enable debug log for ceph-mgr, repeat the
> > steps you mention above and upload the log in the tracker.
>
>
> I have already collected log files after enabling the debug log by `ceph 
> config set mgr mgr/snap_schedule/log_level debug`, and I would be happy to 
> share it.
>
> > Could you please file a tracker here:
> > https://tracker.ceph.com/projects/cephfs/issues/new
>
> I signed up for an account, but need to wait for being approved by an 
> administrator.

Thanks. If you can share the logs, I can create the tracker in the meantime.

>
>
> Cheers,
> Sebastian
>


-- 
Cheers,
Venky

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Manuel Holtgrewe
I'm running rsync "-Wa", see below for a reproduction from scratch
that actually syncs as root when no permissions are given on the
directories.

-- full mount options --

172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/
on /data/cephfs-2 type ceph
(rw,noatime,name=samba,secret=,acl)
172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/
on /data/cephfs-3 type ceph
(rw,noatime,name=gateway,secret=,rbytes,acl)

-- example --

0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y
0|0[root@gw-1 ~]# touch !$z
touch /data/cephfs-2/test2/x/yz
0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2
0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2
0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/.
sending incremental file list
./
x/
x/yz
x/y/

sent 165 bytes  received 50 bytes  430.00 bytes/sec
total size is 0  speedup is 0.00
0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat
  File: /data/cephfs-3/test2
  Size: 0   Blocks: 0  IO Block: 65536  directory
Device: 2dh/45d Inode: 1099522341053  Links: 3
Access: (/d-)  Uid: (0/root)   Gid: (0/root)
Access: 2022-01-28 11:10:31.436380533 +0100
Modify: 2022-01-28 11:09:47.06846 +0100
Change: 2022-01-28 11:10:31.436380533 +0100
 Birth: -
  File: /data/cephfs-3/test2/x
  Size: 0   Blocks: 0  IO Block: 65536  directory
Device: 2dh/45d Inode: 1099522341054  Links: 3
Access: (/d-)  Uid: (0/root)   Gid: (0/root)
Access: 2022-01-28 11:10:31.462380399 +0100
Modify: 2022-01-28 11:09:49.258598614 +0100
Change: 2022-01-28 11:10:31.462380399 +0100
 Birth: -
  File: /data/cephfs-3/test2/x/yz
  Size: 0   Blocks: 0  IO Block: 4194304 regular empty file
Device: 2dh/45d Inode: 1099522341056  Links: 1
Access: (/--)  Uid: (0/root)   Gid: (0/root)
Access: 2022-01-28 11:10:31.447380476 +0100
Modify: 2022-01-28 11:09:49.265598578 +0100
Change: 2022-01-28 11:10:31.447380476 +0100
 Birth: -
  File: /data/cephfs-3/test2/x/y
  Size: 0   Blocks: 0  IO Block: 65536  directory
Device: 2dh/45d Inode: 1099522341055  Links: 2
Access: (/d-)  Uid: (0/root)   Gid: (0/root)
Access: 2022-01-28 11:10:31.439380518 +0100
Modify: 2022-01-28 11:09:47.669606830 +0100
Change: 2022-01-28 11:10:31.439380518 +0100
 Birth: -

On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar  wrote:
>
> On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe  wrote:
> >
> > Hi,
> >
> > thanks for the reply.
> >
> > Actually, mounting the source and remote fs on linux with kernel
> > driver (Rocky Linux 8.5 default kernel), I can `rsync`.
>
> You are probably running rsync with --no-perms or a custom --chmod (or
> one of --no-o, --no-g) I guess?
>
> >
> > Is this to be expected?
> >
> > Cheers,
> >
> > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  wrote:
> > >
> > > Hey Manuel,
> > >
> > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe  
> > > wrote:
> > > >
> > > > OK, reconstructed with another example:
> > > >
> > > > -- source file system --
> > > >
> > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
> > > >  File: /data/cephfs-2/test/x2
> > > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2ch/44d Inode: 1099840816759  Links: 3
> > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > > Change: 2022-01-27 16:24:51.294599055 +0100
> > > > Birth: -
> > > >  File: /data/cephfs-2/test/x2/y2
> > > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2ch/44d Inode: 1099840816760  Links: 2
> > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-27 16:24:22.001750514 +0100
> > > > Modify: 2022-01-27 16:24:27.712720985 +0100
> > > > Change: 2022-01-27 16:24:51.307598988 +0100
> > > > Birth: -
> > > >  File: /data/cephfs-2/test/x2/y2/z
> > > >  Size: 0   Blocks: 0  IO Block: 4194304 regular 
> > > > empty file
> > > > Device: 2ch/44d Inode: 1099840816761  Links: 1
> > > > Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-27 16:24:27.713720980 +0100
> > > > Modify: 2022-01-27 16:24:27.713720980 +0100
> > > > Change: 2022-01-27 16:24:27.713720980 +0100
> > > > Birth: -
> > > >
> > > > -- resulting remote file system --
> > > >
> > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test/x2 | xargs stat
> > > >  File: /data/cephfs-3/test/x2
> > > >  Size: 0   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2dh/45d Inode: 1099521812568  Links: 2
> > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > > Chan

[ceph-users] 'cephadm bootstrap' and 'ceph orch' creates daemons with latest / devel container images instead of stable images

2022-01-28 Thread Arun Vinod
Hi All,

We are trying to deploy the ceph (12.6.7) cluster on production using
cephadm. Unfortunately, we encountered the following situation.

Description

The cephadm(v16.2.7) bootstrap by default chooses container images
quay.io/ceph/ceph:v16 and docker.io/ceph/daemon-base:latest-pacific-devel.
Since we want to avoid using devel and latest container images in
production, we pulled the required images (with static tags) prior to
running bootstrap. Also, we mentioned the image name and --skip-pull
parameter in bootstrap command.
Still cephadm uses the image docker.io/ceph/daemon-base:latest-pacific-devel
for some of the daemons and it is still pulling the image even though
--skip-pull is mentioned.

Due to this, daemons on different host's are running on different versions
of container images.

Hence, there is no provision to use a specific image instead of
docker.io/ceph/daemon-base:latest-pacific-devel during bootstrap for
consistency across all daemons in the cluster. Similary the same behaviour
exists while creating daemons using ceph-orch.


Command used to bootstrap a cluster(stable container images are already
pulled in prior):

sudo cephadm --image quay.io/ceph/ceph:v16.2.7 bootstrap
--skip-monitoring-stack --mon-ip ... --cluster-network ... --ssh-user
ceph_user --config /home/ceph_user/ceph_bootstrap/ceph.conf
--initial-dashboard-password Q5446UBS3KK9 --dashboard-password-noupdate
--no-minimize-config --skip-pull


Below are some entries from cephadm.log, which clearly shows its trying to
pull image even --skip-pull is provided:

2022-01-27 17:11:13,900 7f01b6621b80 INFO Deploying mon service with
default placement...
2022-01-27 17:11:14,212 7f211cc85b80 DEBUG

cephadm ['--image', 'docker.io/ceph/daemon-base:latest-pacific-devel', 'ls']
2022-01-27 17:11:14,296 7f211cc85b80 DEBUG /bin/podman: 3.3.1
2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman:
4da6ea847240,24.26MB / 134.9GB
2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman:
52b12ff050d8,390.7MB / 134.9GB
2022-01-27 17:11:14,660 7f211cc85b80 DEBUG /bin/podman:
5c979c84d182,4.342MB / 134.9GB
2022-01-27 17:11:14,766 7f211cc85b80 DEBUG systemctl: enabled
2022-01-27 17:11:14,778 7f211cc85b80 DEBUG systemctl: active
2022-01-27 17:11:14,912 7f211cc85b80 DEBUG /bin/podman:
52b12ff050d88841131aa6508f7576a1dca8e0004db08384dd13dca6c2d3b725,
quay.io/ceph/ceph:v16.2.7,cc266d6139f4d044d28ace2308f7befcdfead3c3e88bc3faed905298cae299ef,2022-01-27
17:10:33.135056074 +0530 IST,
2022-01-27 17:11:15,059 7f211cc85b80 DEBUG /bin/podman: [
quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728
quay.io/ceph/ceph@sha256:bb6a71f7f481985f6d3b358e3b9ef64c6755b3db5aa53198e0aac38be5c8ae54
]
2022-01-27 17:11:15,456 7f01b6621b80 DEBUG /usr/bin/ceph: Scheduled mon
update...
2022-01-27 17:11:15,641 7f211cc85b80 DEBUG /bin/podman: ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)

2022-01-27 17:11:15,972 7f01b6621b80 INFO Deploying mgr service with
default placement...
2022-01-27 17:11:16,127 7f211cc85b80 DEBUG systemctl: enabled
2022-01-27 17:11:16,140 7f211cc85b80 DEBUG systemctl: active
2022-01-27 17:11:16,296 7f211cc85b80 DEBUG /bin/podman:
4da6ea847240bab786f596ddc87160e11056c74aa7004dc38ee12be331a5ea4e,
quay.io/ceph/ceph:v16.2.7,cc266d6139f4d044d28ace2308f7befcdfead3c3e88bc3faed905298cae299ef,2022-01-27
17:10:25.830630277 +0530 IST,
2022-01-27 17:11:17,023 7f0b0c505b80 DEBUG

cephadm ['--image', 'docker.io/ceph/daemon-base:latest-pacific-devel',
'ceph-volume', '--fsid', 'e3c9bff6-7f65-11ec-bdff-0015171590ba', '--',
'inventory', '--format=json-pretty', '--filter-for-batch']
2022-01-27 17:11:17,102 7f0b0c505b80 DEBUG /bin/podman: 3.3.1
2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman:
4da6ea847240,24.71MB / 134.9GB
2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman:
52b12ff050d8,390.8MB / 134.9GB
2022-01-27 17:11:17,275 7f0b0c505b80 DEBUG /bin/podman:
d242f1fa7a66,28.28MB / 134.9GB
2022-01-27 17:11:17,417 7f0b0c505b80 INFO Inferring config
/var/lib/ceph/e3c9bff6-7f65-11ec-bdff-0015171590ba/mon.hcictrl01/config
2022-01-27 17:11:17,417 7f0b0c505b80 DEBUG Using specified fsid:
e3c9bff6-7f65-11ec-bdff-0015171590ba
2022-01-27 17:11:17,620 7f01b6621b80 DEBUG /usr/bin/ceph: Scheduled mgr
update...
2022-01-27 17:11:17,727 7f0b0c505b80 DEBUG stat: Trying to pull
docker.io/ceph/daemon-base:latest-pacific-devel...

2022-01-27 17:11:18,489 7f01b6621b80 INFO Deploying crash service with
default placement...
2022-01-27 17:11:18,763 7f3ed21eeb80 DEBUG sestatus: SELinux status:
disabled
2022-01-27 17:11:18,768 7f3ed21eeb80 DEBUG sestatus: SELinux status:
disabled
2022-01-27 17:11:18,774 7f3ed21eeb80 DEBUG sestatus: SELinux status:
disabled
2022-01-27 17:11:18,779 7f3ed21eeb80 DEBUG sestatus: SELinux status:
disabled
2022-01-27 17:11:18,784 7f3ed21ee

[ceph-users] Re: CephFS Snapshot Scheduling stops creating Snapshots after a restart of the Manager

2022-01-28 Thread Sebastian Mazza
Hey Venky,

I would be happy if you create the issue.
Under this link: https://www.filemail.com/d/skgyuyszdlgrkxw
you can download the log file and also my description of the problem. The txt 
also includes the most interesting lines of the log.

Cheers,
Sebastian



> On 28.01.2022, at 11:07, Venky Shankar  wrote:
> 
> On Fri, Jan 28, 2022 at 3:03 PM Sebastian Mazza  wrote:
>> 
>> Hey Venky,
>> 
>> thank you very much for your response!
>> 
>>> It would help if you could enable debug log for ceph-mgr, repeat the
>>> steps you mention above and upload the log in the tracker.
>> 
>> 
>> I have already collected log files after enabling the debug log by `ceph 
>> config set mgr mgr/snap_schedule/log_level debug`, and I would be happy to 
>> share it.
>> 
>>> Could you please file a tracker here:
>>> https://tracker.ceph.com/projects/cephfs/issues/new
>> 
>> I signed up for an account, but need to wait for being approved by an 
>> administrator.
> 
> Thanks. If you can share the logs, I can create the tracker in the meantime.
> 
>> 
>> 
>> Cheers,
>> Sebastian
>> 
> 
> 
> -- 
> Cheers,
> Venky
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Venky Shankar
On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe  wrote:
>
> I'm running rsync "-Wa", see below for a reproduction from scratch
> that actually syncs as root when no permissions are given on the
> directories.
>
> -- full mount options --
>
> 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/
> on /data/cephfs-2 type ceph
> (rw,noatime,name=samba,secret=,acl)
> 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/
> on /data/cephfs-3 type ceph
> (rw,noatime,name=gateway,secret=,rbytes,acl)
>
> -- example --
>
> 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y
> 0|0[root@gw-1 ~]# touch !$z
> touch /data/cephfs-2/test2/x/yz
> 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2
> 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2
> 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/.
> sending incremental file list
> ./
> x/
> x/yz
> x/y/

Try running this from a ceph-fuse mount - it would fail. It's probably
related to the way how permission checks are done (we may want to fix
that in the user-space driver).

Since the mirror daemon uses the user-space library, it would be
running into the same permission related constraints as ceph-fuse.

>
> sent 165 bytes  received 50 bytes  430.00 bytes/sec
> total size is 0  speedup is 0.00
> 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat
>   File: /data/cephfs-3/test2
>   Size: 0   Blocks: 0  IO Block: 65536  directory
> Device: 2dh/45d Inode: 1099522341053  Links: 3
> Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-28 11:10:31.436380533 +0100
> Modify: 2022-01-28 11:09:47.06846 +0100
> Change: 2022-01-28 11:10:31.436380533 +0100
>  Birth: -
>   File: /data/cephfs-3/test2/x
>   Size: 0   Blocks: 0  IO Block: 65536  directory
> Device: 2dh/45d Inode: 1099522341054  Links: 3
> Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-28 11:10:31.462380399 +0100
> Modify: 2022-01-28 11:09:49.258598614 +0100
> Change: 2022-01-28 11:10:31.462380399 +0100
>  Birth: -
>   File: /data/cephfs-3/test2/x/yz
>   Size: 0   Blocks: 0  IO Block: 4194304 regular empty 
> file
> Device: 2dh/45d Inode: 1099522341056  Links: 1
> Access: (/--)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-28 11:10:31.447380476 +0100
> Modify: 2022-01-28 11:09:49.265598578 +0100
> Change: 2022-01-28 11:10:31.447380476 +0100
>  Birth: -
>   File: /data/cephfs-3/test2/x/y
>   Size: 0   Blocks: 0  IO Block: 65536  directory
> Device: 2dh/45d Inode: 1099522341055  Links: 2
> Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> Access: 2022-01-28 11:10:31.439380518 +0100
> Modify: 2022-01-28 11:09:47.669606830 +0100
> Change: 2022-01-28 11:10:31.439380518 +0100
>  Birth: -
>
> On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar  wrote:
> >
> > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe  
> > wrote:
> > >
> > > Hi,
> > >
> > > thanks for the reply.
> > >
> > > Actually, mounting the source and remote fs on linux with kernel
> > > driver (Rocky Linux 8.5 default kernel), I can `rsync`.
> >
> > You are probably running rsync with --no-perms or a custom --chmod (or
> > one of --no-o, --no-g) I guess?
> >
> > >
> > > Is this to be expected?
> > >
> > > Cheers,
> > >
> > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  
> > > wrote:
> > > >
> > > > Hey Manuel,
> > > >
> > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe  
> > > > wrote:
> > > > >
> > > > > OK, reconstructed with another example:
> > > > >
> > > > > -- source file system --
> > > > >
> > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
> > > > >  File: /data/cephfs-2/test/x2
> > > > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > > > Device: 2ch/44d Inode: 1099840816759  Links: 3
> > > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/
> > > > > root)
> > > > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > > > Change: 2022-01-27 16:24:51.294599055 +0100
> > > > > Birth: -
> > > > >  File: /data/cephfs-2/test/x2/y2
> > > > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > > > Device: 2ch/44d Inode: 1099840816760  Links: 2
> > > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/
> > > > > root)
> > > > > Access: 2022-01-27 16:24:22.001750514 +0100
> > > > > Modify: 2022-01-27 16:24:27.712720985 +0100
> > > > > Change: 2022-01-27 16:24:51.307598988 +0100
> > > > > Birth: -
> > > > >  File: /data/cephfs-2/test/x2/y2/z
> > > > >  Size: 0   Blocks: 0  IO Block: 4194304 regular 
> > > > > empty file
> > > > > Device: 2ch/44d Inode: 1099840816761  Links: 1
> > > > > Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/
> > > > > root)
> > > > > Access: 2022-01-2

[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Manuel Holtgrewe
OK, so there is a different in semantics of the kernel and the user
space driver?

Which one would you consider to be desired?

>From what I can see, the kernel semantics (apparently: root can do
everything) would allow to sync between file systems no matter what.
With the current user space semantics, users could `chmod a=` folders
in their $HOME and stop the sync from working. Is my interpretation
correct?

Best wishes,
Manuel

On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar  wrote:
>
> On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe  wrote:
> >
> > I'm running rsync "-Wa", see below for a reproduction from scratch
> > that actually syncs as root when no permissions are given on the
> > directories.
> >
> > -- full mount options --
> >
> > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/
> > on /data/cephfs-2 type ceph
> > (rw,noatime,name=samba,secret=,acl)
> > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/
> > on /data/cephfs-3 type ceph
> > (rw,noatime,name=gateway,secret=,rbytes,acl)
> >
> > -- example --
> >
> > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y
> > 0|0[root@gw-1 ~]# touch !$z
> > touch /data/cephfs-2/test2/x/yz
> > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2
> > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2
> > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/.
> > sending incremental file list
> > ./
> > x/
> > x/yz
> > x/y/
>
> Try running this from a ceph-fuse mount - it would fail. It's probably
> related to the way how permission checks are done (we may want to fix
> that in the user-space driver).
>
> Since the mirror daemon uses the user-space library, it would be
> running into the same permission related constraints as ceph-fuse.
>
> >
> > sent 165 bytes  received 50 bytes  430.00 bytes/sec
> > total size is 0  speedup is 0.00
> > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat
> >   File: /data/cephfs-3/test2
> >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > Device: 2dh/45d Inode: 1099522341053  Links: 3
> > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-28 11:10:31.436380533 +0100
> > Modify: 2022-01-28 11:09:47.06846 +0100
> > Change: 2022-01-28 11:10:31.436380533 +0100
> >  Birth: -
> >   File: /data/cephfs-3/test2/x
> >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > Device: 2dh/45d Inode: 1099522341054  Links: 3
> > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-28 11:10:31.462380399 +0100
> > Modify: 2022-01-28 11:09:49.258598614 +0100
> > Change: 2022-01-28 11:10:31.462380399 +0100
> >  Birth: -
> >   File: /data/cephfs-3/test2/x/yz
> >   Size: 0   Blocks: 0  IO Block: 4194304 regular empty 
> > file
> > Device: 2dh/45d Inode: 1099522341056  Links: 1
> > Access: (/--)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-28 11:10:31.447380476 +0100
> > Modify: 2022-01-28 11:09:49.265598578 +0100
> > Change: 2022-01-28 11:10:31.447380476 +0100
> >  Birth: -
> >   File: /data/cephfs-3/test2/x/y
> >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > Device: 2dh/45d Inode: 1099522341055  Links: 2
> > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > Access: 2022-01-28 11:10:31.439380518 +0100
> > Modify: 2022-01-28 11:09:47.669606830 +0100
> > Change: 2022-01-28 11:10:31.439380518 +0100
> >  Birth: -
> >
> > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar  wrote:
> > >
> > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe  
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > thanks for the reply.
> > > >
> > > > Actually, mounting the source and remote fs on linux with kernel
> > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`.
> > >
> > > You are probably running rsync with --no-perms or a custom --chmod (or
> > > one of --no-o, --no-g) I guess?
> > >
> > > >
> > > > Is this to be expected?
> > > >
> > > > Cheers,
> > > >
> > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > Hey Manuel,
> > > > >
> > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe 
> > > > >  wrote:
> > > > > >
> > > > > > OK, reconstructed with another example:
> > > > > >
> > > > > > -- source file system --
> > > > > >
> > > > > > 0|0[root@gw-1 ~]# find /data/cephfs-2/test/x2 | xargs stat
> > > > > >  File: /data/cephfs-2/test/x2
> > > > > >  Size: 1   Blocks: 0  IO Block: 65536  directory
> > > > > > Device: 2ch/44d Inode: 1099840816759  Links: 3
> > > > > > Access: (2440/dr--r-S---)  Uid: (0/root)   Gid: (0/
> > > > > > root)
> > > > > > Access: 2022-01-27 16:24:15.627783470 +0100
> > > > > > Modify: 2022-01-27 16:24:22.001750514 +0100
> > > > > > Change: 2022-01-27 16:24:51.294599055 +0100
> > > > > > Birth: -
> > > > > >  File: /data/cephfs-2/test/x2/y2
> > > > > > 

[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Venky Shankar
On Fri, Jan 28, 2022 at 4:22 PM Manuel Holtgrewe  wrote:
>
> OK, so there is a different in semantics of the kernel and the user
> space driver?

Right.

>
> Which one would you consider to be desired?

The kernel driver is probably doing the right thing.

>
> From what I can see, the kernel semantics (apparently: root can do
> everything) would allow to sync between file systems no matter what.
> With the current user space semantics, users could `chmod a=` folders
> in their $HOME and stop the sync from working. Is my interpretation
> correct?

Correct.

I haven't root caused the issue with the user space driver yet. This
blocks using the cephfs-mirror daemon with read-only source
directories.

I'll file a tracker for this. Thanks for your help.

>
> Best wishes,
> Manuel
>
> On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar  wrote:
> >
> > On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe  
> > wrote:
> > >
> > > I'm running rsync "-Wa", see below for a reproduction from scratch
> > > that actually syncs as root when no permissions are given on the
> > > directories.
> > >
> > > -- full mount options --
> > >
> > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/
> > > on /data/cephfs-2 type ceph
> > > (rw,noatime,name=samba,secret=,acl)
> > > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/
> > > on /data/cephfs-3 type ceph
> > > (rw,noatime,name=gateway,secret=,rbytes,acl)
> > >
> > > -- example --
> > >
> > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y
> > > 0|0[root@gw-1 ~]# touch !$z
> > > touch /data/cephfs-2/test2/x/yz
> > > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2
> > > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2
> > > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. /data/cephfs-3/test2/.
> > > sending incremental file list
> > > ./
> > > x/
> > > x/yz
> > > x/y/
> >
> > Try running this from a ceph-fuse mount - it would fail. It's probably
> > related to the way how permission checks are done (we may want to fix
> > that in the user-space driver).
> >
> > Since the mirror daemon uses the user-space library, it would be
> > running into the same permission related constraints as ceph-fuse.
> >
> > >
> > > sent 165 bytes  received 50 bytes  430.00 bytes/sec
> > > total size is 0  speedup is 0.00
> > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat
> > >   File: /data/cephfs-3/test2
> > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > Device: 2dh/45d Inode: 1099522341053  Links: 3
> > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-28 11:10:31.436380533 +0100
> > > Modify: 2022-01-28 11:09:47.06846 +0100
> > > Change: 2022-01-28 11:10:31.436380533 +0100
> > >  Birth: -
> > >   File: /data/cephfs-3/test2/x
> > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > Device: 2dh/45d Inode: 1099522341054  Links: 3
> > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-28 11:10:31.462380399 +0100
> > > Modify: 2022-01-28 11:09:49.258598614 +0100
> > > Change: 2022-01-28 11:10:31.462380399 +0100
> > >  Birth: -
> > >   File: /data/cephfs-3/test2/x/yz
> > >   Size: 0   Blocks: 0  IO Block: 4194304 regular 
> > > empty file
> > > Device: 2dh/45d Inode: 1099522341056  Links: 1
> > > Access: (/--)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-28 11:10:31.447380476 +0100
> > > Modify: 2022-01-28 11:09:49.265598578 +0100
> > > Change: 2022-01-28 11:10:31.447380476 +0100
> > >  Birth: -
> > >   File: /data/cephfs-3/test2/x/y
> > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > Device: 2dh/45d Inode: 1099522341055  Links: 2
> > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > Access: 2022-01-28 11:10:31.439380518 +0100
> > > Modify: 2022-01-28 11:09:47.669606830 +0100
> > > Change: 2022-01-28 11:10:31.439380518 +0100
> > >  Birth: -
> > >
> > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar  
> > > wrote:
> > > >
> > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe  
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > thanks for the reply.
> > > > >
> > > > > Actually, mounting the source and remote fs on linux with kernel
> > > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`.
> > > >
> > > > You are probably running rsync with --no-perms or a custom --chmod (or
> > > > one of --no-o, --no-g) I guess?
> > > >
> > > > >
> > > > > Is this to be expected?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > On Fri, Jan 28, 2022 at 10:44 AM Venky Shankar  
> > > > > wrote:
> > > > > >
> > > > > > Hey Manuel,
> > > > > >
> > > > > > On Thu, Jan 27, 2022 at 8:57 PM Manuel Holtgrewe 
> > > > > >  wrote:
> > > > > > >
> > > > > > > OK, reconstructed with another example:
> > > > > > >
> > > > > > > -- source file system --
> > > > > > >
> > > > > > > 0|

[ceph-users] Re: Limitations of ceph fs snapshot mirror for read-only folders?

2022-01-28 Thread Manuel Holtgrewe
Great.

No, thank *you* for such excellent software!

On Fri, Jan 28, 2022 at 1:20 PM Venky Shankar  wrote:
>
> On Fri, Jan 28, 2022 at 4:22 PM Manuel Holtgrewe  wrote:
> >
> > OK, so there is a different in semantics of the kernel and the user
> > space driver?
>
> Right.
>
> >
> > Which one would you consider to be desired?
>
> The kernel driver is probably doing the right thing.
>
> >
> > From what I can see, the kernel semantics (apparently: root can do
> > everything) would allow to sync between file systems no matter what.
> > With the current user space semantics, users could `chmod a=` folders
> > in their $HOME and stop the sync from working. Is my interpretation
> > correct?
>
> Correct.
>
> I haven't root caused the issue with the user space driver yet. This
> blocks using the cephfs-mirror daemon with read-only source
> directories.
>
> I'll file a tracker for this. Thanks for your help.
>
> >
> > Best wishes,
> > Manuel
> >
> > On Fri, Jan 28, 2022 at 11:43 AM Venky Shankar  wrote:
> > >
> > > On Fri, Jan 28, 2022 at 3:42 PM Manuel Holtgrewe  
> > > wrote:
> > > >
> > > > I'm running rsync "-Wa", see below for a reproduction from scratch
> > > > that actually syncs as root when no permissions are given on the
> > > > directories.
> > > >
> > > > -- full mount options --
> > > >
> > > > 172.16.62.10,172.16.62.11,172.16.62.11,172.16.62.12,172.16.62.13,172.16.62.30:/
> > > > on /data/cephfs-2 type ceph
> > > > (rw,noatime,name=samba,secret=,acl)
> > > > 172.16.62.22,172.16.62.23,172.16.62.23,172.16.62.24,172.16.62.25,172.16.62.32:/
> > > > on /data/cephfs-3 type ceph
> > > > (rw,noatime,name=gateway,secret=,rbytes,acl)
> > > >
> > > > -- example --
> > > >
> > > > 0|0[root@gw-1 ~]# mkdir -p /data/cephfs-2/test2/x/y
> > > > 0|0[root@gw-1 ~]# touch !$z
> > > > touch /data/cephfs-2/test2/x/yz
> > > > 0|0[root@gw-1 ~]# chmod a= -R /data/cephfs-2/test2
> > > > 0|0[root@gw-1 ~]# mkdir /data/cephfs-3/test2
> > > > 0|0[root@gw-1 ~]# rsync -va /data/cephfs-2/test2/. 
> > > > /data/cephfs-3/test2/.
> > > > sending incremental file list
> > > > ./
> > > > x/
> > > > x/yz
> > > > x/y/
> > >
> > > Try running this from a ceph-fuse mount - it would fail. It's probably
> > > related to the way how permission checks are done (we may want to fix
> > > that in the user-space driver).
> > >
> > > Since the mirror daemon uses the user-space library, it would be
> > > running into the same permission related constraints as ceph-fuse.
> > >
> > > >
> > > > sent 165 bytes  received 50 bytes  430.00 bytes/sec
> > > > total size is 0  speedup is 0.00
> > > > 0|0[root@gw-1 ~]# find /data/cephfs-3/test2 | xargs stat
> > > >   File: /data/cephfs-3/test2
> > > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2dh/45d Inode: 1099522341053  Links: 3
> > > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-28 11:10:31.436380533 +0100
> > > > Modify: 2022-01-28 11:09:47.06846 +0100
> > > > Change: 2022-01-28 11:10:31.436380533 +0100
> > > >  Birth: -
> > > >   File: /data/cephfs-3/test2/x
> > > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2dh/45d Inode: 1099522341054  Links: 3
> > > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-28 11:10:31.462380399 +0100
> > > > Modify: 2022-01-28 11:09:49.258598614 +0100
> > > > Change: 2022-01-28 11:10:31.462380399 +0100
> > > >  Birth: -
> > > >   File: /data/cephfs-3/test2/x/yz
> > > >   Size: 0   Blocks: 0  IO Block: 4194304 regular 
> > > > empty file
> > > > Device: 2dh/45d Inode: 1099522341056  Links: 1
> > > > Access: (/--)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-28 11:10:31.447380476 +0100
> > > > Modify: 2022-01-28 11:09:49.265598578 +0100
> > > > Change: 2022-01-28 11:10:31.447380476 +0100
> > > >  Birth: -
> > > >   File: /data/cephfs-3/test2/x/y
> > > >   Size: 0   Blocks: 0  IO Block: 65536  directory
> > > > Device: 2dh/45d Inode: 1099522341055  Links: 2
> > > > Access: (/d-)  Uid: (0/root)   Gid: (0/root)
> > > > Access: 2022-01-28 11:10:31.439380518 +0100
> > > > Modify: 2022-01-28 11:09:47.669606830 +0100
> > > > Change: 2022-01-28 11:10:31.439380518 +0100
> > > >  Birth: -
> > > >
> > > > On Fri, Jan 28, 2022 at 11:06 AM Venky Shankar  
> > > > wrote:
> > > > >
> > > > > On Fri, Jan 28, 2022 at 3:20 PM Manuel Holtgrewe 
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > thanks for the reply.
> > > > > >
> > > > > > Actually, mounting the source and remote fs on linux with kernel
> > > > > > driver (Rocky Linux 8.5 default kernel), I can `rsync`.
> > > > >
> > > > > You are probably running rsync with --no-perms or a custom --chmod (or
> > > > > one of --no-o, --no-g) I guess?
> > > > >
> > > > > >
> > > > > > Is this to be expected?
> > > > > >
> > > > > > Cheers,

[ceph-users] Support for additional bind-mounts to specific container types

2022-01-28 Thread Stephen Smith6
Hey folks - We’ve been using a hack to get bind mounts into our manager 
containers for various reasons. We’ve realized that this quickly breaks down 
when our “hacks” don’t exist inside “cephadm” in the manager container and we 
execute a “ceph orch upgrade”. Is there an official way to add a bind mount to 
a manager container?

Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however we use 
a certificate to encrypt monitoring traffic that we need the ability to rotate. 
If the certificate is mapped in via a bind mount it can much more easily be 
rotated in the event it is ever comprised.

This same use case is used for other custom code we have running as a manager 
plugin.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Support for additional bind-mounts to specific container types

2022-01-28 Thread Marc
> 
> Hey folks - We’ve been using a hack to get bind mounts into our manager
> containers for various reasons. We’ve realized that this quickly breaks
> down when our “hacks” don’t exist inside “cephadm” in the manager
> container and we execute a “ceph orch upgrade”. Is there an official way
> to add a bind mount to a manager container?

I am not really an expert on the use of cephadm or containers but. Are these 
things not wrong in your 'hack' thinking.

1. that would imply that you always have to run this as eeehhh root?
2. afaik is best practice that your oc supplies volumes to your container.

> Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however
> we use a certificate to encrypt monitoring traffic that we need the
> ability to rotate. 

Generate long term certificates from your own ca.

OT: stop hacking
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Support for additional bind-mounts to specific container types

2022-01-28 Thread Stephen Smith6
Point 1 (Why are we running as root?):
All Ceph containers are instantiated as root (Privileged - for "reasons") but 
daemons inside the container run a user 167 ("ceph" user).

I don't understand your second point, if you're saying that the "container" is 
what specifies mount points that's incorrect. It's the "docker run" 
instantiation of the container that specifies what mount points are passed to 
the container and that is controlled by "cephadm" today.

The length of validity of a mutual TLS certificate means nothing if a hacker 
compromises the key.

On 1/28/22, 8:35 AM, "Marc"  wrote:

> 
> Hey folks - We’ve been using a hack to get bind mounts into our manager
> containers for various reasons. We’ve realized that this quickly breaks
> down when our “hacks” don’t exist inside “cephadm” in the manager
> container and we execute a “ceph orch upgrade”. Is there an official way
> to add a bind mount to a manager container?

I am not really an expert on the use of cephadm or containers but. Are 
these things not wrong in your 'hack' thinking.

1. that would imply that you always have to run this as eeehhh root?
2. afaik is best practice that your oc supplies volumes to your container.

> Our use case: We’re using zabbix_sender + Zabbix to monitor Ceph however
> we use a certificate to encrypt monitoring traffic that we need the
> ability to rotate. 

Generate long term certificates from your own ca.

OT: stop hacking

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-01-28 Thread Igor Fedotov


On 1/26/2022 1:18 AM, Sebastian Mazza wrote:

Hey Igor,

thank you for your response!


Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io 
for productive clusters?


Generally upstream recommendation is to disable disk write caching, there were 
multiple complains it might negatively impact the performance in some setups.

As for bluefs_buffered_io - please keep it on, the disablmement is known to 
cause performance drop.

Thanks for the explanation. For the enabled disk write cache you only mentioned 
possible performance problem, but can the enabled disk write cache also lead to 
data corruption? Or make a problem more likely than with a disabled disk cache?


Definitely it can, particularly if cache isn't protected  from power 
loss or the implementation isn't so good ;)




When rebooting a node  - did you perform it by regular OS command (reboot or 
poweroff) or by a power switch?

I never did a hard reset or used the power switch. I used `init 6` for 
performing a reboot. Each server has redundant power supplies with one 
connected to a battery backup and the other to the grid. Therefore, I do think 
that none of the servers ever faced a non clean shutdown or reboot.


So the original reboot which caused the failures was made in the same manner, 
right?

Yes, Exactly.
And the OSD logs confirms that:

OSD 4:
2021-12-12T21:33:07.780+0100 7f464a944700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal 
Terminated ***
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1608
:...
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002145.sst
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 
bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db:


OSD 7:
2021-12-12T21:20:11.141+0100 7f9714894700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1937
:...
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 
bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:


OSD 8:
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1938
:...
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 
bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db:



Best regards,
Sebastian



--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Removed daemons listed as stray

2022-01-28 Thread Adam King
Hello Vlad,

Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning
is specifically designed to point out daemons in the cluster that cephadm
is not aware of/in control of. It does this by comparing the daemons it has
cached info on (this cached info is what you see in "ceph orch ps") with
the return value of a core mgr function designed to list the servers in the
cluster and what daemons are on them. This function, from cephadm's point
of view, is a bit of a black box (by design, as it is meant  to find
daemons cephadm is not aware of/in control of). If you'd like to see a
rough estimate of what that looks like I'd check the output of "ceph node
ls" (you may see your non-existent osds listed there). This means, a daemon
that does not exist that cephadm is falsely reporting as a stray daemon
cannot typically be resolved through "ceph orch . . ." commands. In the
past I've found sometimes just doing a mgr failover ("ceph mgr fail") will
clear this in the case of false reports so that's what I'd try first. If
that doesn't, I'd maybe try checking if the osd is till listed in the crush
map and if so, remove it (first step in
https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd
I think). It's possible that the reason the daemon rm commands hung is one
of the cleanup operations cephadm was trying to run under the hood when
removing the osd hung and so the osd is still believed to be present by the
cluster.

- Adam

On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:

> Hello
>
> I needed to permanently remove two drives from my pool so I
> ran "ceph orch daemon rm XXX". The command hung for both
> OSDs, but the daemons were removed. I then purged the two OSDs.
>
> Now ceph status is complaining about them with
> CEPHADM_STRAY_DAEMON, but the daemons aren't running and are
> not showing up in ceph orch ps. If I try to "daemon rm"
> again I get Error EINVAL: Unable to find daemon(s).
>
> Anybody have an idea about what could have happened or how
> to stop ceph status from listing the non-existing daemons as
> stray?
>
>
> Thanks,
>
> Vlad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm trouble

2022-01-28 Thread Adam King
Hmm, I'm not seeing anything that could be a cause in any of that output. I
did notice, however, from your "ceph orch ls" output that none of your
services have been refreshed since the 24th. Cephadm typically tries to
refresh these things every 10 minutes so that signals something is quite
wrong. Could you try running "ceph mgr fail" and if nothing seems to be
resolved could you post "ceph log last 200 debug cephadm". Maybe we can see
if something gets stuck again after the mgr restarts.

Thanks,

 - Adam King

On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov  wrote:

> Hi!
>
> I think this happened after I tried to recreate the osd with the command
> "ceph orch daemon add osd s-8-2-1:/dev/bcache0"
>
>
> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container
> image
> > for some daemon. Can you provide the output of "ceph orch ls --format
> > yaml",
>
> https://pastebin.com/CStBf4J0
>
> > "ceph orch upgrade status",
> root@s-26-9-19-mon-m1:~# ceph orch upgrade status
> {
> "target_image": null,
> "in_progress": false,
> "services_complete": [],
> "progress": null,
> "message": ""
> }
>
>
> > "ceph config get mgr container_image",
> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image
>
> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728
>
>
> > and the values for monitoring stack container images (format is "ceph
> > config get mgr mgr/cephadm/container_image_" where daemon
> type
> > is one of "prometheus", "node_exporter", "alertmanager", "grafana",
> > "haproxy", "keepalived").
> quay.io/prometheus/prometheus:v2.18.1
> quay.io/prometheus/node-exporter:v0.18.1
> quay.io/prometheus/alertmanager:v0.20.0
> quay.io/ceph/ceph-grafana:6.7.4
> docker.io/library/haproxy:2.3
> docker.io/arcts/keepalived
>
> >
> > Thanks,
> >
> > - Adam King
>
> Thanks a lot!
>
> WBR,
> Fyodor.
>
> >
> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov  wrote:
> >
> >> Hi!
> >>
> >> I rebooted the nodes with mgr and now I see the following in the
> >> cephadm.log:
> >>
> >> As I understand it - cephadm is trying to execute some unsuccessful
> >> command of mine (I wonder which one), it does not succeed, but it keeps
> >> trying and trying. How do I stop it from trying?
> >>
> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG
> >>
> 
> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1
> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image
> >> s-8-2-1:/dev/bcache0...
> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error:
> invalid
> >> reference format
> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO Non-zero exit code 125 from
> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
> >> 2022-01-27 16:02:58,279 7fca7beca740 INFO /usr/bin/podman: stderr Error:
> >> invalid reference format
> >> 2022-01-27 16:02:58,279 7fca7beca740 ERROR ERROR: Failed command:
> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
> >> 2022-01-27 16:03:58,420 7f897a7a6740 DEBUG
> >>
> 
> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
> >> 2022-01-27 16:03:58,443 7f897a7a6740 DEBUG /usr/bin/podman: 3.3.1
> >> 2022-01-27 16:03:58,547 7f897a7a6740 INFO Pulling container image
> >> s-8-2-1:/dev/bcache0...
> >> 2022-01-27 16:03:58,575 7f897a7a6740 DEBUG /usr/bin/podman: Error:
> invalid
> >> reference format
> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO Non-zero exit code 125 from
> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
> >> 2022-01-27 16:03:58,577 7f897a7a6740 INFO /usr/bin/podman: stderr Error:
> >> invalid reference format
> >> 2022-01-27 16:03:58,577 7f897a7a6740 ERROR ERROR: Failed command:
> >> /usr/bin/podman pull s-8-2-1:/dev/bcache0
> >>
> >> WBR,
> >> Fyodor.
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Reinstalling OSD node managed by cephadm

2022-01-28 Thread Manuel Holtgrewe
Dear all,

Recently, there were some very specific questions regarding
reinstalling an OSD node while keeping the disks intact. The
discussion went around corner cases. I think that I have a very easy
case

- vanilla cluster setup with ansible playbooks
- adopted by cephadm
- latest pacific 16.2.7

What is the overall process of reinstalling (e.g., for going from
enterprise linux 7 to 8) and getting my OSDs back afterwards.

- reinstall operating system on system disk
- install cephadm binary
- ... now what? ;-)

Best wishes,
Manuel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm trouble

2022-01-28 Thread Fyodor Ustinov
Hi!

> Hmm, I'm not seeing anything that could be a cause in any of that output. I
> did notice, however, from your "ceph orch ls" output that none of your
> services have been refreshed since the 24th. Cephadm typically tries to
> refresh these things every 10 minutes so that signals something is quite
> wrong. 
>From what I see in /var/log/ceph/cephadm.log it tries to run the same command 
>once a minute and does nothing else. That's why the status has not been 
>updated for 5 days.

> Could you try running "ceph mgr fail" and if nothing seems to be
> resolved could you post "ceph log last 200 debug cephadm". Maybe we can see
> if something gets stuck again after the mgr restarts.
"ceph mgr fail" did not help.
"ceph log last 200 debug cephadm" show again and again and again:

2022-01-28T20:57:12.792090+ mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 349 
: cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container 
image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 
1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
2022-01-28T20:58:13.092996+ mgr.s-26-9-24-mon-m2.nhltmq (mgr.129738166) 392 
: cephadm [ERR] cephadm exited with an error code: 1, stderr:Pulling container 
image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0
Traceback (most recent call last):
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1363, in _remote_connection
yield (conn, connr)
  File "/usr/share/ceph/mgr/cephadm/serve.py", line 1256, in _run_cephadm
code, '\n'.join(err)))
orchestrator._interface.OrchestratorError: cephadm exited with an error code: 
1, stderr:Pulling container image s-8-2-1:/dev/bcache0...
Non-zero exit code 125 from /usr/bin/podman pull s-8-2-1:/dev/bcache0
/usr/bin/podman: stderr Error: invalid reference format
ERROR: Failed command: /usr/bin/podman pull s-8-2-1:/dev/bcache0

> 
> Thanks,
> 
> - Adam King
> 
> On Thu, Jan 27, 2022 at 7:06 PM Fyodor Ustinov  wrote:
> 
>> Hi!
>>
>> I think this happened after I tried to recreate the osd with the command
>> "ceph orch daemon add osd s-8-2-1:/dev/bcache0"
>>
>>
>> > It looks like cephadm believes "s-8-2-1:/dev/bcache0" is a container
>> image
>> > for some daemon. Can you provide the output of "ceph orch ls --format
>> > yaml",
>>
>> https://pastebin.com/CStBf4J0
>>
>> > "ceph orch upgrade status",
>> root@s-26-9-19-mon-m1:~# ceph orch upgrade status
>> {
>> "target_image": null,
>> "in_progress": false,
>> "services_complete": [],
>> "progress": null,
>> "message": ""
>> }
>>
>>
>> > "ceph config get mgr container_image",
>> root@s-26-9-19-mon-m1:~# ceph config get mgr container_image
>>
>> quay.io/ceph/ceph@sha256:2f7f0af8663e73a422f797de605e769ae44eb0297f2a79324739404cc1765728
>>
>>
>> > and the values for monitoring stack container images (format is "ceph
>> > config get mgr mgr/cephadm/container_image_" where daemon
>> type
>> > is one of "prometheus", "node_exporter", "alertmanager", "grafana",
>> > "haproxy", "keepalived").
>> quay.io/prometheus/prometheus:v2.18.1
>> quay.io/prometheus/node-exporter:v0.18.1
>> quay.io/prometheus/alertmanager:v0.20.0
>> quay.io/ceph/ceph-grafana:6.7.4
>> docker.io/library/haproxy:2.3
>> docker.io/arcts/keepalived
>>
>> >
>> > Thanks,
>> >
>> > - Adam King
>>
>> Thanks a lot!
>>
>> WBR,
>> Fyodor.
>>
>> >
>> > On Thu, Jan 27, 2022 at 9:10 AM Fyodor Ustinov  wrote:
>> >
>> >> Hi!
>> >>
>> >> I rebooted the nodes with mgr and now I see the following in the
>> >> cephadm.log:
>> >>
>> >> As I understand it - cephadm is trying to execute some unsuccessful
>> >> command of mine (I wonder which one), it does not succeed, but it keeps
>> >> trying and trying. How do I stop it from trying?
>> >>
>> >> 2022-01-27 16:02:58,123 7fca7beca740 DEBUG
>> >>
>> 
>> >> cephadm ['--image', 's-8-2-1:/dev/bcache0', 'pull']
>> >> 2022-01-27 16:02:58,147 7fca7beca740 DEBUG /usr/bin/podman: 3.3.1
>> >> 2022-01-27 16:02:58,249 7fca7beca740 INFO Pulling container image
>> >> s-8-2-1:/dev/bcache0...
>> >> 2022-01-27 16:02:58,278 7fca7beca740 DEBUG /usr/bin/podman: Error:
>> invali

[ceph-users] Re: Removed daemons listed as stray

2022-01-28 Thread Ricardo Alonso
I had a situation like this, and the only operation that solved was a full
reboot of the cluster (it was due the a watchdog alarm), but when the
cluster return, the stray osds were gone.

On Fri, 28 Jan 2022, 19:32 Adam King,  wrote:

> Hello Vlad,
>
> Just some insight into how CEPHADM_STRAY_DAEMON works: This health warning
> is specifically designed to point out daemons in the cluster that cephadm
> is not aware of/in control of. It does this by comparing the daemons it has
> cached info on (this cached info is what you see in "ceph orch ps") with
> the return value of a core mgr function designed to list the servers in the
> cluster and what daemons are on them. This function, from cephadm's point
> of view, is a bit of a black box (by design, as it is meant  to find
> daemons cephadm is not aware of/in control of). If you'd like to see a
> rough estimate of what that looks like I'd check the output of "ceph node
> ls" (you may see your non-existent osds listed there). This means, a daemon
> that does not exist that cephadm is falsely reporting as a stray daemon
> cannot typically be resolved through "ceph orch . . ." commands. In the
> past I've found sometimes just doing a mgr failover ("ceph mgr fail") will
> clear this in the case of false reports so that's what I'd try first. If
> that doesn't, I'd maybe try checking if the osd is till listed in the crush
> map and if so, remove it (first step in
>
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-the-osd
> I think). It's possible that the reason the daemon rm commands hung is one
> of the cleanup operations cephadm was trying to run under the hood when
> removing the osd hung and so the osd is still believed to be present by the
> cluster.
>
> - Adam
>
> On Fri, Jan 28, 2022 at 11:28 AM Vladimir Brik <
> vladimir.b...@icecube.wisc.edu> wrote:
>
> > Hello
> >
> > I needed to permanently remove two drives from my pool so I
> > ran "ceph orch daemon rm XXX". The command hung for both
> > OSDs, but the daemons were removed. I then purged the two OSDs.
> >
> > Now ceph status is complaining about them with
> > CEPHADM_STRAY_DAEMON, but the daemons aren't running and are
> > not showing up in ceph orch ps. If I try to "daemon rm"
> > again I get Error EINVAL: Unable to find daemon(s).
> >
> > Anybody have an idea about what could have happened or how
> > to stop ceph status from listing the non-existing daemons as
> > stray?
> >
> >
> > Thanks,
> >
> > Vlad
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io