[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Only change it with a custom profile, no with built-in profiles, i am configuring it from ceph dashboard. osd_mclock_scheduler_client_wgt=6 -> this is my setting Vào Th 7, 13 thg 1, 2024 vào lúc 02:19 Anthony D'Atri đã viết: > > > > On Jan 12, 2024, at 03:31, Phong Tran Thanh > wrote: > > >

[ceph-users] 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-12 Thread Özkan Göksu
Hello. I have 5 node ceph cluster and I'm constantly having "clients failing to respond to cache pressure" warning. I have 84 cephfs kernel clients (servers) and my users are accessing their personal subvolumes located on one pool. My users are software developers and the data is home and user

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Anthony D'Atri
> On Jan 12, 2024, at 03:31, Phong Tran Thanh wrote: > > Hi Yang and Anthony, > > I found the solution for this problem on a HDD disk 7200rpm > > When the cluster recovers, one or multiple disk failures because slowop > appears and then affects the cluster, we can change these configurations

[ceph-users] Unable to locate "bluestore_compressed_allocated" & "bluestore_compressed_original" parameters while executing "ceph daemon osd.X perf dump" command.

2024-01-12 Thread Alam Mohammad
Hi, We are considering BlueStore compression test in our cluster. For this we have created rbd image on our EC pool. While we are executing "ceph daemon osd.X perf dump | grep -E '(compress_.*_count|bluestore_compressed_)'", we are not locate below parameters, even we tried with ceph tell comm

[ceph-users] recommendation for barebones server with 8-12 direct attach NVMe?

2024-01-12 Thread Drew Weaver
Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus)

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer
More info on problem 2: When starting the dashboard, the mgr seems to try to initialise cephadm, which in turn uses python crypto libraries that lead to the python error: $ ceph crash info 2024-01-12T11:10:03.938478Z_2263d2c8-8120-417e-84bc-bb01f5d81e52 {     "backtrace": [     "  File \

[ceph-users] Debian 12 (bookworm) / Reef 18.2.1 problems

2024-01-12 Thread Chris Palmer
I was delighted to see the native Debian 12 (bookworm) packages turn up in Reef 18.2.1. We currently run a number of ceph clusters on Debian11 (bullseye) / Quincy 17.2.7. These are not cephadm-managed. I have attempted to upgrade a test cluster, and it is not going well. Quincy only supports

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Torkil Svensgaard
On 12-01-2024 10:30, Frank Schilder wrote: Is it maybe this here: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon I always have to tweak the num-tries parameters. Oh, that seems plausible. Kinda scary, and odd, that hitting this doesn't gener

[ceph-users] Re: RGW - user created bucket with name of already created bucket

2024-01-12 Thread Ondřej Kukla
Thanks Jayanth, I’ve tried this but unfortunately the unlink fails as the it checks against the bucket owner id which is not the user I’m trying to unlink. So I’m still stuck here with two users with same bucket name :( Ondrej > On 24. 12. 2023, at 17:14, Jayanth Reddy wrote: > > Hi Ondřej,

[ceph-users] Re: RGW rate-limiting or anti-hammering for (external) auth requests // Anti-DoS measures

2024-01-12 Thread Christian Rohmann
Hey Istvan, On 10.01.24 03:27, Szabo, Istvan (Agoda) wrote: I'm using in the frontend https config on haproxy like this, it works so far good: stick-table type ip size 1m expire 10s store http_req_rate(10s) tcp-request inspect-delay 10s tcp-request content track-sc0 src http-request deny deny

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frank Schilder
Is it maybe this here: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon I always have to tweak the num-tries parameters. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 Fr

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Torkil Svensgaard
On 12-01-2024 09:35, Frédéric Nass wrote: Hello Torkil, Hi Frédéric We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the below rule: rule ec54 {         id 3         type erasure         min_size 3         max_size 9         step set_chooseleaf_tries

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Yes, it's good for me, reduce recovery process from 4GB/s to 200MB/s Vào Th 6, 12 thg 1, 2024 vào lúc 15:52 Szabo, Istvan (Agoda) < istvan.sz...@agoda.com> đã viết: > Is it better? > > Istvan Szabo > Staff Infrastructure Engineer > --- > Agoda Ser

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass
Samuel,   Hard to tell for sure since this bug hit different major versions of the kernel, at least RHEL's from what I know. The only way to tell is to check for num_cgroups in /proc/cgroups: $ cat /proc/cgroups | grep -e subsys -e blkio | column -t    #subsys_name  hierarchy  num_cgroup

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Szabo, Istvan (Agoda)
Is it better? Istvan Szabo Staff Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- From: Phong Tran T

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frédéric Nass
Hello Torkil,   We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the below rule:   rule ec54 {         id 3         type erasure         min_size 3         max_size 9         step set_chooseleaf_tries 5         step set_choose_tries 100         step take def

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
I update the config osd_mclock_profile=custom osd_mclock_scheduler_background_recovery_lim=0.2 osd_mclock_scheduler_background_recovery_res=0.2 osd_mclock_scheduler_client_wgt=6 Vào Th 6, 12 thg 1, 2024 vào lúc 15:31 Phong Tran Thanh < tranphong...@gmail.com> đã viết: > Hi Yang and Anthony, > >

[ceph-users] Re: About ceph disk slowops effect to cluster

2024-01-12 Thread Phong Tran Thanh
Hi Yang and Anthony, I found the solution for this problem on a HDD disk 7200rpm When the cluster recovers, one or multiple disk failures because slowop appears and then affects the cluster, we can change these configurations and may reduce IOPS when recovery. osd_mclock_profile=custom osd_mclock

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread huxia...@horebdata.cn
Dear Frederic, Thanks a lot for the suggestions. We are using the valilla Linux 4.19 LTS version. Do you think we may be suffering from the same bug? best regards, Samuel huxia...@horebdata.cn From: Frédéric Nass Date: 2024-01-12 09:19 To: huxiaoyu CC: ceph-users Subject: Re: [ceph-users]

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass
Hello,   We've had a similar situation recently where OSDs would use way more memory than osd_memory_target and get OOM killed by the kernel. It was due to a kernel bug related to cgroups [1].   If num_cgroups below keeps increasing then you may hit this bug.   $ cat /proc/cgroups | grep