[ceph-users] Re: MDS Behind on Trimming...

2024-04-19 Thread Erich Weiler
Hi Xiubo, Nevermind I was wrong, most the blocked ops were 12 hours old. Ug. I restarted the MDS daemon to clear them. I just reset to having one active MDS instead of two, let's see if that makes a difference. I am beginning to think it may be impossible to catch the logs that matter here

[ceph-users] Re: Mysterious Space-Eating Monster

2024-04-19 Thread Anthony D'Atri
Look for unlinked but open files, it may not be Ceph at fault. Suboptimal logrotate rules can cause this. lsof, fsck -n, etc. > On Apr 19, 2024, at 05:54, Sake Ceph wrote: > > Hi Matthew, > > Cephadm doesn't cleanup old container images, at least with Quincy. After a > upgrade we run the f

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Anthony D'Atri
This is a ymmv thing, it depends on one's workload. > > However, we have some questions about this and are looking for some guidance > and advice. > > The first one is about the expected benefits. Before we undergo the efforts > involved in the transition, we are wondering if it is even worth

[ceph-users] Re: Ceph image delete error - NetHandler create_socket couldnt create socket

2024-04-19 Thread P Wagner-Beccard
With cephadm you're able to set these values cluster wide. See the host-management section of the docs. https://docs.ceph.com/en/reef/cephadm/host-management/#os-tuning-profiles On Fri, 19 Apr 2024 at 12:40, Konstantin Shalygin wrote: > Hi, > > > On 19 Apr 2024, at 10:39, Pardhiv Karri wrote: >

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Simon Kepp
Hi Ondrej, When running multiple OSDs on a shared DB/WAL NVME,it is important to take into account,when designing your redundancy/failure domains,that the loss of a single NVMe drive will take out a number of OSDs.You must design your redundancy,so tat it is acceptable to lose that many OSDs simult

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Ondřej Kukla
Hello, I’m going to mainly answer the practical questions Niklaus had. Our standart setup is 12HDDs and 2 Enterprise NVMe per node which means we have 6 OSDs per 1 NVMe. For the partition we use LVM. The fact that one one failed NVMe takes down 6 OSDs isn’t great but our osd-node count is more

[ceph-users] Re: Ceph image delete error - NetHandler create_socket couldnt create socket

2024-04-19 Thread Konstantin Shalygin
Hi, > On 19 Apr 2024, at 10:39, Pardhiv Karri wrote: > > Thank you for the reply. I tried setting ulimit to 32768 when I saw 25726 > number in lsof output and then after 2 disks deletion again it got an error > and checked lsof and which is above 35000. I'm not sure how to handle it. > I reboot

[ceph-users] Re: Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Torkil Svensgaard
Hi Red Hat Ceph support told us back in the day that 16 DB/WAL partitions pr NVMe were the max supported by RHCS because their testing showed performance suffered beyond that. We are running with 11 pr NVMe. We are prepared to lose a bunch of OSDs if we have an NVMe die. We expect ceph will

[ceph-users] Re: Mysterious Space-Eating Monster

2024-04-19 Thread Sake Ceph
Hi Matthew, Cephadm doesn't cleanup old container images, at least with Quincy. After a upgrade we run the following commands: sudo podman system prune -a -f sudo podman volume prune -f But if someone has a better advice, please tell us. Kind regards, Sake > Op 19-04-2024 10:24 CEST schreef

[ceph-users] Best practice and expected benefits of using separate WAL and DB devices with Bluestore

2024-04-19 Thread Niklaus Hofer
Dear all We have an HDD ceph cluster that could do with some more IOPS. One solution we are considering is installing NVMe SSDs into the storage nodes and using them as WAL- and/or DB devices for the Bluestore OSDs. However, we have some questions about this and are looking for some guidance

[ceph-users] Mysterious Space-Eating Monster

2024-04-19 Thread duluxoz
Hi All, *Something* is chewing up a lot of space on our `\var` partition to the point where we're getting warnings about the Ceph monitor running out of space (ie > 70% full). I've been looking, but I can't find anything significant (ie log files aren't too big, etc) BUT there seem to be a h

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread duluxoz
Cool! Thanks for that  :-) On 19/04/2024 18:01, Zac Dover wrote: I think I understand, after more thought. The second command is expected to work after the first. I will ask the cephfs team when they wake up. Zac Dover Upstream Docs Ceph Foundation On Fri, Apr 19, 2024 at 17:51, duluxoz ma

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread Zac Dover
I think I understand, after more thought. The second command is expected to work after the first. I will ask the cephfs team when they wake up. Zac Dover Upstream Docs Ceph Foundation On Fri, Apr 19, 2024 at 17:51, duluxoz <[dulu...@gmail.com](mailto:On Fri, Apr 19, 2024 at 17:51, duluxoz < wr

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread duluxoz
Hi Zac, Yeap, followed the instructions (ie removed the client) and then re-ran the commands - say thing. What in particular do you need to know?  :-) Cheers Dulux-Oz On 19/04/2024 17:58, Zac Dover wrote: Did you remove client.x from the config? I need more information about your cluster

[ceph-users] Re: Latest Doco Out Of Date?

2024-04-19 Thread Zac Dover
Did you remove client.x from the config? I need more information about your cluster before I can determine whether the documentation is wrong. Zac Dover Upstream Docs Ceph Foundation On Fri, Apr 19, 2024 at 17:51, duluxoz <[dulu...@gmail.com](mailto:On Fri, Apr 19, 2024 at 17:51, duluxoz < wro

[ceph-users] Latest Doco Out Of Date?

2024-04-19 Thread duluxoz
Hi All, In reference to this page from the Ceph documentation: https://docs.ceph.com/en/latest/cephfs/client-auth/, down the bottom of that page it says that you can run the following commands: ~~~ ceph fs authorize a client.x /dir1 rw ceph fs authorize a client.x /dir2 rw ~~~ This will allo

[ceph-users] Re: Ceph image delete error - NetHandler create_socket couldnt create socket

2024-04-19 Thread Pardhiv Karri
Hi Konstantin, Thank you for the reply. I tried setting ulimit to 32768 when I saw 25726 number in lsof output and then after 2 disks deletion again it got an error and checked lsof and which is above 35000. I'm not sure how to handle it. I rebooted the monitor node, but the open files kept growin