[ceph-users] Stray count increasing due to snapshots (?)

2019-09-05 Thread Hector Martin
I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe 
this is caused by snapshots. The backup process for this filesystem 
consists of creating a snapshot and rsyncing it over daily, and 
snapshots are kept locally in the FS for 2 months for backup and 
disaster recovery reasons.


As I understand it, any files deleted which still remain referenced from 
a snapshot end up being moved to the stray directories, right?


I've seen stories of problems once the stray count hits 1M (100k per 
stray subdirectory), so I'm worried about this possibly happening in the 
future as the data volume grows. AIUI dirfrags are enabled by default 
now, so I expect the stray directories to be fragmented too, but from 
what little documentation I can find, this does not seem to be the case.


rados -p cephfs_metadata listomapkeys 600. | wc -l
43014

The fragment is the '' in the object name, right? If so, each 
stray subdir seems to be holding about 10% of the total strays in its 
first fragment, with no additional fragments. As I understand it, 
fragments should start to be created when the directory grows to over 
1 entries.


(aside: is there any good documentation about the on-RADOS data 
structures used by CephFS? I would like to get more familiar with 
everything to have a better chance of fixing problems should I run into 
some data corruption in the future)


--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stray count increasing due to snapshots (?)

2019-09-05 Thread Yan, Zheng
On Thu, Sep 5, 2019 at 4:31 PM Hector Martin  wrote:
>
> I have a production CephFS (13.2.6 Mimic) with >400K strays. I believe
> this is caused by snapshots. The backup process for this filesystem
> consists of creating a snapshot and rsyncing it over daily, and
> snapshots are kept locally in the FS for 2 months for backup and
> disaster recovery reasons.
>
> As I understand it, any files deleted which still remain referenced from
> a snapshot end up being moved to the stray directories, right?
>
yes


> I've seen stories of problems once the stray count hits 1M (100k per
> stray subdirectory), so I'm worried about this possibly happening in the
> future as the data volume grows. AIUI dirfrags are enabled by default
> now, so I expect the stray directories to be fragmented too, but from
> what little documentation I can find, this does not seem to be the case.
>
> rados -p cephfs_metadata listomapkeys 600. | wc -l
> 43014
>
> The fragment is the '' in the object name, right? If so, each
> stray subdir seems to be holding about 10% of the total strays in its
> first fragment, with no additional fragments. As I understand it,
> fragments should start to be created when the directory grows to over
> 1 entries.
>

stray subdir never get fragmented in current implementation.

> (aside: is there any good documentation about the on-RADOS data
> structures used by CephFS? I would like to get more familiar with
> everything to have a better chance of fixing problems should I run into
> some data corruption in the future)
>
> --
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stray count increasing due to snapshots (?)

2019-09-05 Thread Hector Martin

On 05/09/2019 18.39, Yan, Zheng wrote:

stray subdir never get fragmented in current implementation.


Then this is a problem, right? Once stray subdirs hit 100K files things 
will start failing. Is there a solution for this, or do we need to 
figure out some other backup mechanism that doesn't involve keeping two 
months worth of snapshots? That CephFS can't support this kind of use 
case (and in general that CephFS uses the stray subdir persistently for 
files in snapshots that could remain forever, while the stray dirs don't 
scale) sounds like a bug.


--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD as ifs backup destination

2019-09-05 Thread Kyriazis, George
Hello Ceph-users,

I am currently testing / experimenting with Ceph with some extra hardware that 
is laying around.  I am running Nautilus on Ubuntu 18.04 (all nodes).

The problem statement is that I’d like to backup a FreeNAS server using ZFS 
Snapshots and replication to a Ceph cluster.

I created a linux VM inside FreeNAS and inside the VM I mounted the RBD device 
of the Ceph cluster.  I created a test block device, mounted it on the VM and 
created a ZFS pool on it.  I did a test replication to to the device inside the 
VM and it worked great.

My next what-if scenario is “what if I need to expand the zpool?”.  I unmounted 
the zpool, unmounted the RBD device, and then resized it using “rbd resize 
—size  rbd1”.  Then I remounted the RBD device.  No problems here.  
However, when I mount the zpool and expand it (zpool import /dev/rbd0; zpool 
online -e vol) I get a panic.

Any thoughts as to what could be wrong?

Thanks!

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse segfaults in 14.2.2

2019-09-05 Thread Brad Hubbard
On Wed, Sep 4, 2019 at 9:42 PM Andras Pataki
 wrote:
>
> Dear ceph users,
>
> After upgrading our ceph-fuse clients to 14.2.2, we've been seeing sporadic 
> segfaults with not super revealing stack traces:
>
> in thread 7fff5a7fc700 thread_name:ceph-fuse
>
>  ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus 
> (stable)
>  1: (()+0xf5d0) [0x760b85d0]
>  2: (()+0x255a0c) [0x557a9a0c]
>  3: (()+0x16b6b) [0x77bb3b6b]
>  4: (()+0x13401) [0x77bb0401]
>  5: (()+0x7dd5) [0x760b0dd5]
>  6: (clone()+0x6d) [0x74b5cead]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.

if you install the appropriate debuginfo package (this would depend on
your OS) you may get a more enlightening stack.
>
>
> Prior to 14.2.2, we've run 12.2.11 and 13.2.5 and have not seen this issue.  
> Has anyone encountered this?  If it isn't known - I can file a bug tracker 
> for it.

Please do and maybe try to capture a core dump if you can't get a
better backtrace?

>
> Andras
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com