[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-03 Thread Magnus HAGDORN
if a OSD becomes unavailble (broken disk, rebooting server) then all I/O to the PGs stored on that OSD will block until replication level of 2 is reached again. So, for a highly available cluster you need a replication level of 3 On Wed, 2021-02-03 at 10:24 +0100, Mario Giammarco wrote: > Hello,

[ceph-users] Re: Worst thing that can happen if I have size= 2

2021-02-03 Thread Magnus HAGDORN
On Wed, 2021-02-03 at 09:39 +, Max Krasilnikov wrote: > > if a OSD becomes unavailble (broken disk, rebooting server) then > > all > > I/O to the PGs stored on that OSD will block until replication > > level of > > 2 is reached again. So, for a highly available cluster you need a > > replicatio

[ceph-users] best use of NVMe drives

2021-02-16 Thread Magnus HAGDORN
Hi there, we are in the process of growing our Nautilus ceph cluster. Currently, we have 6 nodes, 3 nodes with 2×5.5TB, 6x11TB disks and 8x186GB SSD and 3 nodes with 6×5.5TB and 6×7.5TB disks. All with dual link 10GE NICs. The SSDs are used for the CephFS metadata pool, the hard drives are used for

[ceph-users] Re: Quick quota question

2021-03-17 Thread Magnus HAGDORN
On Wed, 2021-03-17 at 08:26 +, Andrew Walker-Brown wrote: > When setting a quota on a pool (or directory in Cephfs), is it the > amount of client data written or the client data x number of replicas > that counts toward the quota? It's the amount of data stored so independent of replication le

[ceph-users] Re: add and start OSD without rebalancing

2021-03-24 Thread Magnus HAGDORN
we recently added 3 new nodes with 12x12TB OSDs. It took 3 days or so to reshuffle the data and another 3 days to split the pgs. I did increase the number of max backfills to speed up the process. We didn't notice the reshuffling in normal operation. On Wed, 2021-03-24 at 19:32 +0100, Dan van der

[ceph-users] Re: Exporting CephFS using Samba preferred method

2021-04-14 Thread Magnus HAGDORN
On Wed, 2021-04-14 at 08:55 +0200, Martin Palma wrote: > Hello, > > what is the currently preferred method, in terms of stability and > performance, for exporting a CephFS directory with Samba? > > - locally mount the CephFS directory and export it via Samba? > - using the "vfs_ceph" module of Samb

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Magnus HAGDORN
I totally agree - we use a management system to manage all our Linux machines. Adding containers just makes that a lot more complex, especially since our management system does not support containers. Regards magnus On Wed, 2021-06-02 at 10:36 +0100, Matthew Vernon wrote: > This email was sent to

[ceph-users] libceph: monX session lost, hunting for new mon

2021-06-16 Thread Magnus HAGDORN
Hi all, I know this came up before but I couldn't find a resolution. We get the error libceph: monX session lost, hunting for new mon a lot on our samba servers that reexport cephfs. A lot means more than once a minute. On other machines that are less busy we get it about every 10-30 minutes. We on

[ceph-users] Re: samba cephfs

2021-07-12 Thread Magnus HAGDORN
We are using SL7 to export our cephfs via samba to windows. The RHEL7/Centos7/SL7 distros do not come with packages for the samba cephfs module. This is one of the reasons why we are mounting the file system locally using the kernel cephfs module with the automounter and reexporting it using vanill

[ceph-users] snapshotted cephfs deleting files 'no space left on device'

2021-10-14 Thread Magnus HAGDORN
Hi all, we have hit the problem where a directory tree containing over a million entries was deleted on a snapshotted cephfs. The cluster reports mostly healthy except for some slow MDS responses. However, the filesystem became unusable. The MDS reports ceph daemon mds.`hostname -s` perf dump | gr

[ceph-users] MDS in state stopping

2021-10-14 Thread Magnus HAGDORN
Hi there, further to my earlier email https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/46BETLK5CIHBLRLCP5ZW4IAWTY4POADL/ so we tried to reduce the number of meta data servers to 1 (from 2). rank.1 is now sitting in the stopping state but nothing is happening. We no longer have any c

[ceph-users] Re: MDS in state stopping

2021-10-14 Thread Magnus HAGDORN
On Thu, 2021-10-14 at 14:25 +0200, Dan van der Ster wrote: > Is that confirmed with a higher debug_mds setting on that "stuck" > mds? > > > > You should try to understand what mds.1 is doing, via debug_mds=10 or > so. > > > > If it really looks idle, then it might be worth restarting mds.1's > > da

[ceph-users] cephfs, snapshots, deletion and stray files

2021-10-14 Thread Magnus HAGDORN
Hi all, we seem to have recovered from our cephfs misadventure. Having said that I would like to better understand what went wrong and if/how we can avoid that in future. We have nautilus ceph cluster that provides cephfs to our school. We keep nightly snapshots for one week. One user has a parti

[ceph-users] ceph fs status output

2021-10-14 Thread Magnus HAGDORN
Hi all, during our recent cephfs misadventure we have been staring a lot at the output from ceph fs status and we were wondering what the numbers under the dns and inos heading mean? Cheers magnus The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC

[ceph-users] failed OSD daemon

2022-07-25 Thread Magnus Hagdorn
Hi there, on our pacific (16.2.9) cluster one of the OSD daemons has died and fails to restart. The OSD exposes a NVMe drive and is one of 4 identical machines. We are using podman to orchestrate the ceph daemons. The underlying OS is managed. The system worked fine without any issues until recentl

[ceph-users] Re: clients failing to respond to cache pressure (nfs-ganesha)

2021-10-20 Thread Magnus HAGDORN
We have increased the cache on our MDS which makes this issue mostly go away. It is due to an interaction between the MDS and the ganesha NFS server which keeps its own cache. I believe newer versions of ganesha can deal with it. Sent from Android device On 20 Oct 2021 09:37, Marc wrote: This

[ceph-users] Re: How can user home directory quotas be automatically set on CephFS?

2021-11-02 Thread Magnus HAGDORN
Hi Artur, we did write a script (in fact a series of scripts) that we use to manage our users and their quotas. Our script adds a new user to our LDAP and sets the default quotas for various storage areas. Quota information is kept in the LDAP. Another script periodically scans the LDAP for changes

[ceph-users] Re: Moving data between two mounts of the same CephFS

2022-05-18 Thread Magnus HAGDORN
Hi Mathias, I have noticed in the past the moving directories within the same mount point can take a very long time using the system mv command. I use a python script to archive old user directories by moving them to a different part of the filesystem which is not exposed to the users. I use the re

[ceph-users] MDS stuck in replay

2022-05-31 Thread Magnus HAGDORN
Hi all, it seems to be the time of stuck MDSs. We also have our ceph filesystem degraded. The MDS is stuck in replay for about 20 hours now. We run a nautilus ceph cluster with about 300TB of data and many millions of files. We run two MDSs with a particularly large directory pinned to one of them

[ceph-users] Re: MDS stuck in replay

2022-06-02 Thread Magnus HAGDORN
at this stage we are not so worried about recovery since we moved to our new pacific cluster. The problem arose during one of the nightly syncs of the old cluster to the new cluster. However, we are quite keen to use this as a learning opportunity to see what we can do to bring this filesystem back

[ceph-users] Re: MDS stuck in replay

2022-06-06 Thread Magnus HAGDORN
On Sat, 2022-06-04 at 14:36 -0400, Ramana Venkatesh Raja wrote: > If that's not helpful, then try setting `ceph config set mds > > debug_objecter 10`, restart the MDS, and check the objecter related > > logs in the MDS? This didn't reveal anything useful - I just got the occasional tick. I restar

[ceph-users] removing the private cluster network

2020-06-30 Thread Magnus HAGDORN
Hi there, we currently have a ceph cluster with 6 nodes and a public and cluster network. Each node has two bonded 2x1GE network interfaces, one for the public and one for the cluster network. We are planning to upgrade the networking to 10GE. Given the modest size of our cluster we would like to s

[ceph-users] damaged cephfs

2020-09-03 Thread Magnus HAGDORN
Hi there, we reconfigured our ceph cluster yesterday to remove the cluster network and things didn't quite go to plan. I am trying to figure out what went wrong and also what to do next. We are running nautilus 14.2.10 on Scientific Linux 7.8. So, we are using a mixture of RBDs and cephfs. For th

[ceph-users] Re: damaged cephfs

2020-09-05 Thread Magnus HAGDORN
Hi Patrick, thanks for the reply On Fri, 2020-09-04 at 10:25 -0700, Patrick Donnelly wrote: > > We then started using the cephfs (we keep VM images on the cephfs). > > The > > MDS were showing an error. I restarted the MDS but they didn't come > > back.We then followed the instructions here: > > h

[ceph-users] Re: damaged cephfs

2020-09-06 Thread Magnus HAGDORN
On Sat, 2020-09-05 at 08:10 +, Magnus HAGDORN wrote: > > I don't have any recent data on how long it could take but you > > might > > try using at least 8 workers. > > > We are using 4 workers and the first stage hasn't completed yet. Is > it > &g