[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please

[ceph-users] Re: How to use hardware

2023-11-22 Thread Albert Shih
Le 20/11/2023 à 09:24:41+, Frank Schilder a écrit Hi, Thanks everyone for your answer. > > we are using something similar for ceph-fs. For a backup system your setup > can work, depending on how you back up. While HDD pools have poor IOP/s > performance, they are very good for streaming

[ceph-users] Re: ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman
On 22-11-2023 15:54, Stefan Kooman wrote: Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm de

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
Hi Frank, Locally I had some test by using the copy2 and copy, but they all worked well for me. Could you write a reproducing script ? Thanks - Xiubo On 11/10/23 22:53, Frank Schilder wrote: It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce i

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
I just raised one tracker to follow this: https://tracker.ceph.com/issues/63510 Thanks - Xiubo On 11/10/23 22:53, Frank Schilder wrote: It looks like the cap update request was dropped to the ground in MDS. [...] If you can reproduce it, then please provide the mds logs by setting: [...] I

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, subtree is being exported" And how to avoid this, would you please help to shed some light here? Th

[ceph-users] ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman
Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm deploys ceph-exporter with proper unit.run a

[ceph-users] CephFS - MDS removed from map - filesystem keeps to be stopped

2023-11-22 Thread Denis Polom
Hi running Ceph Pacific 16.2.13. we had full CephFS filesystem and after adding new HW we tried to start it but our MDS daemons are pushed to be standby and are removed from MDS map. Filesystem was broken, so we repaired it with: # ceph fs fail cephfs # cephfs-journal-tool --rank=cephfs:0

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Thanks for this. This looks similar to what we're observing. Although we don't use the API apart from the usage by Ceph deployment itself - which I guess still counts. /Z On Wed, 22 Nov 2023, 15:22 Adrien Georget, wrote: > Hi, > > This memory leak with ceph-mgr seems to be due to a change in Ce

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Yes, we use docker, though we haven't had any issues because of it. I don't think that docker itself can cause mgr memory leaks. /Z On Wed, 22 Nov 2023, 15:14 Eugen Block, wrote: > One other difference is you use docker, right? We use podman, could it > be some docker restriction? > > Zitat von

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Adrien Georget
Hi, This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12. Check this issue : https://tracker.ceph.com/issues/59580 We are also affected by this, with or without containerized services. Cheers, Adrien Le 22/11/2023 à 14:14, Eugen Block a écrit : One other difference is you

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
One other difference is you use docker, right? We use podman, could it be some docker restriction? Zitat von Zakhar Kirpichenko : It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, give or take, i

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384 GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory, give or take, is available (mostly used by page cache) on each node during normal operation. Nothing unusual there, tbh. No unusual mgr modules or set

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
What does your hardware look like memory-wise? Just for comparison, one customer cluster has 4,5 GB in use (middle-sized cluster for openstack, 280 OSDs): PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,79

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
I've disabled the progress module entirely and will see how it goes. Otherwise, mgr memory usage keeps increasing slowly, from past experience it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's unclear what could have caused random memory ballooning. /Z On Wed, 22 Nov 202

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Frank Schilder
There are some unhandled race conditions in the MDS cluster in rare circumstances. We had this issue with mimic and octopus and it went away after manually pinning sub-dirs to MDS ranks; see https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same pa

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Eugen Block
Hi, we've seen this a year ago in a Nautilus cluster with multi-active MDS as well. It turned up only once within several years and we decided not to look too closely at that time. How often do you see it? Is it reproducable? In that case I'd recommend to create a tracker issue. Regards,

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
I see these progress messages all the time, I don't think they cause it, but I might be wrong. You can disable it just to rule that out. Zitat von Zakhar Kirpichenko : Unfortunately, I don't have a full stack trace because there's no crash when the mgr gets oom-killed. There's just the mgr lo

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Zakhar Kirpichenko
Unfortunately, I don't have a full stack trace because there's no crash when the mgr gets oom-killed. There's just the mgr log, which looks completely normal until about 2-3 minutes before the oom-kill, when tmalloc warnings show up. I'm not sure that it's the same issue that is described in the t

[ceph-users] Re: Ceph 16.2.14: ceph-mgr getting oom-killed

2023-11-22 Thread Eugen Block
Do you have the full stack trace? The pastebin only contains the "tcmalloc: large alloc" messages (same as in the tracker issue). Maybe comment in the tracker issue directly since Radek asked for someone with a similar problem in a newer release. Zitat von Zakhar Kirpichenko : Thanks, Eug

[ceph-users] Re: No SSL Dashboard working after installing mgr crt|key with RSA/4096 secp384r1

2023-11-22 Thread Ackermann, Christoph
Hello Eugen, thanks for the validation. Actually I use plain http because I do not have much time to look for a solution. But i will check a new cert ASAP. Christoph Am Fr., 17. Nov. 2023 um 12:57 Uhr schrieb Eugen Block : > I was able to reproduce the error with a self-signed elliptic curves

[ceph-users] mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread zxcs
HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same path. then we see some log about