[ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
Hello We are experiencing an issue where our ceph MDS gobbles up 500G of RAM, is killed by the kernel, dies, then repeats. We have 3 MDS daemons on different machines, and all are exhibiting this behavior. We are running the following versions (from Docker): * ceph/daemon:v3.2.1-stable-3

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
We decided to go ahead and try truncating the journal, but before we did, we would try to back it up. However, there are ridiculous values in the header. It can't write a journal this large because (I presume) my ext4 filesystem can't seek to this position in the (sparse) file. I would not be

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-01 Thread Pickett, Neale T
k 0 MDS as failed * Reset the FS (yes, I really mean it) * Restart MDSes * Finally get some sleep If anybody has any idea what may have caused this situation, I am keenly interested. If not, hopefully I at least helped someone else. ____ From: Pickett, Ne

Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

2019-04-03 Thread Pickett, Neale T
ith versions, we'll update all those to 12.2.10 today :) From: Yan, Zheng Sent: Tuesday, April 2, 2019 20:26 To: Sergey Malinin Cc: Pickett, Neale T; ceph-users Subject: Re: [ceph-users] MDS allocates all memory (>500G) replaying, OOM-killed, repeat

[ceph-users] mds servers in endless segfault loop

2019-10-10 Thread Pickett, Neale T
Hello, ceph-users. Our mds servers keep segfaulting from a failed assertion, and for the first time I can't find anyone else who's posted about this problem. None of them are able to stay up, so our cephfs is down. We recently had to truncate the journal log after an upgrade to nautilus, and

Re: [ceph-users] mds servers in endless segfault loop

2019-10-11 Thread Pickett, Neale T
I have created an anonymized crash log at https://pastebin.ubuntu.com/p/YsVXQQTBCM/ in the hopes that it can help someone understand what's leading to our MDS outage. Thanks in advance for any assistance. From: Pickett, Neale T Sent: Thursday, Octob

[ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-18 Thread Pickett, Neale T
Last week I asked about a rogue inode that was causing ceph-mds to segfault during replay. We didn't get any suggestions from this list, so we have been familiarizing ourselves with the ceph source code, and have added the following patch: --- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -7

Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-25 Thread Pickett, Neale T
like an inode problem to me, but I have completely run out of ideas, so I will do nothing more to ceph as I anxoiusly hope I am not fired for this 14-days-and-counting outage while awaiting a reply from the list. Thank you very much! Neale From: Patrick Donnelly

Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-28 Thread Pickett, Neale T
) one. And somehow it can handle hard links, possibly (we don't have many, or any, of these). Thanks very much for your help. This has been fascinating. Neale From: Patrick Donnelly Sent: Monday, October 28, 2019 12:52 To: Pickett, Neale T Cc: ceph-