[ceph-users] Stray count increasing due to snapshots (?)
I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe this is caused by snapshots. The backup process for this filesystem consists of creating a snapshot and rsyncing it over daily, and snapshots are kept locally in the FS for 2 months for backup and disaster recovery reasons. As I understand it, any files deleted which still remain referenced from a snapshot end up being moved to the stray directories, right? I've seen stories of problems once the stray count hits 1M (100k per stray subdirectory), so I'm worried about this possibly happening in the future as the data volume grows. AIUI dirfrags are enabled by default now, so I expect the stray directories to be fragmented too, but from what little documentation I can find, this does not seem to be the case. rados -p cephfs_metadata listomapkeys 600. | wc -l 43014 The fragment is the '' in the object name, right? If so, each stray subdir seems to be holding about 10% of the total strays in its first fragment, with no additional fragments. As I understand it, fragments should start to be created when the directory grows to over 1 entries. (aside: is there any good documentation about the on-RADOS data structures used by CephFS? I would like to get more familiar with everything to have a better chance of fixing problems should I run into some data corruption in the future) -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stray count increasing due to snapshots (?)
On Thu, Sep 5, 2019 at 4:31 PM Hector Martin wrote: > > I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe > this is caused by snapshots. The backup process for this filesystem > consists of creating a snapshot and rsyncing it over daily, and > snapshots are kept locally in the FS for 2 months for backup and > disaster recovery reasons. > > As I understand it, any files deleted which still remain referenced from > a snapshot end up being moved to the stray directories, right? > yes > I've seen stories of problems once the stray count hits 1M (100k per > stray subdirectory), so I'm worried about this possibly happening in the > future as the data volume grows. AIUI dirfrags are enabled by default > now, so I expect the stray directories to be fragmented too, but from > what little documentation I can find, this does not seem to be the case. > > rados -p cephfs_metadata listomapkeys 600. | wc -l > 43014 > > The fragment is the '' in the object name, right? If so, each > stray subdir seems to be holding about 10% of the total strays in its > first fragment, with no additional fragments. As I understand it, > fragments should start to be created when the directory grows to over > 1 entries. > stray subdir never get fragmented in current implementation. > (aside: is there any good documentation about the on-RADOS data > structures used by CephFS? I would like to get more familiar with > everything to have a better chance of fixing problems should I run into > some data corruption in the future) > > -- > Hector Martin (hec...@marcansoft.com) > Public Key: https://mrcn.st/pub > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stray count increasing due to snapshots (?)
On 05/09/2019 18.39, Yan, Zheng wrote: stray subdir never get fragmented in current implementation. Then this is a problem, right? Once stray subdirs hit 100K files things will start failing. Is there a solution for this, or do we need to figure out some other backup mechanism that doesn't involve keeping two months worth of snapshots? That CephFS can't support this kind of use case (and in general that CephFS uses the stray subdir persistently for files in snapshots that could remain forever, while the stray dirs don't scale) sounds like a bug. -- Hector Martin (hec...@marcansoft.com) Public Key: https://mrcn.st/pub ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD as ifs backup destination
Hello Ceph-users, I am currently testing / experimenting with Ceph with some extra hardware that is laying around. I am running Nautilus on Ubuntu 18.04 (all nodes). The problem statement is that I’d like to backup a FreeNAS server using ZFS Snapshots and replication to a Ceph cluster. I created a linux VM inside FreeNAS and inside the VM I mounted the RBD device of the Ceph cluster. I created a test block device, mounted it on the VM and created a ZFS pool on it. I did a test replication to to the device inside the VM and it worked great. My next what-if scenario is “what if I need to expand the zpool?”. I unmounted the zpool, unmounted the RBD device, and then resized it using “rbd resize —size rbd1”. Then I remounted the RBD device. No problems here. However, when I mount the zpool and expand it (zpool import /dev/rbd0; zpool online -e vol) I get a panic. Any thoughts as to what could be wrong? Thanks! George ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse segfaults in 14.2.2
On Wed, Sep 4, 2019 at 9:42 PM Andras Pataki wrote: > > Dear ceph users, > > After upgrading our ceph-fuse clients to 14.2.2, we've been seeing sporadic > segfaults with not super revealing stack traces: > > in thread 7fff5a7fc700 thread_name:ceph-fuse > > ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus > (stable) > 1: (()+0xf5d0) [0x760b85d0] > 2: (()+0x255a0c) [0x557a9a0c] > 3: (()+0x16b6b) [0x77bb3b6b] > 4: (()+0x13401) [0x77bb0401] > 5: (()+0x7dd5) [0x760b0dd5] > 6: (clone()+0x6d) [0x74b5cead] > NOTE: a copy of the executable, or `objdump -rdS ` is needed to > interpret this. if you install the appropriate debuginfo package (this would depend on your OS) you may get a more enlightening stack. > > > Prior to 14.2.2, we've run 12.2.11 and 13.2.5 and have not seen this issue. > Has anyone encountered this? If it isn't known - I can file a bug tracker > for it. Please do and maybe try to capture a core dump if you can't get a better backtrace? > > Andras > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com