[ceph-users] unmatched rstat rbytes on single dirfrag

2025-01-24 Thread Frank Schilder
Hi all, I see error messages like these in the logs every now and then: 10:14:44 [ERR] unmatched rstat rbytes on single dirfrag 0x615, inode has n(v2211 rc2038-01-18T21:22:13.00+0100 b2506575730 9676264=3693+9672571), dirfrag has n(v2211 rc2025-01-24T10:14:44.628760+0100 b30517 102=3+99) 10

[ceph-users] Re: unmatched rstat rbytes on single dirfrag

2025-01-24 Thread Eugen Block
Hi, a quick search [0] shows the same messages. A scrub with repair seems to fix that. But wasn’t scrubbing causing the recent issue in the first place? [0] https://silvenga.com/posts/notes-on-cephfs-metadata-recovery/ Zitat von Frank Schilder : Hi all, I see error messages like these in

[ceph-users] Re: Modify or override ceph_default_alerts.yml

2025-01-24 Thread Eugen Block
Hi Redo, thanks for the suggestion. I haven't tried it either yet, but I think the easier option would be to simply keep a custom file on the prometheus host and map it into the container with extra_container_args. It's similar to your approach, I guess, but I wouldn't want to modify too mu

[ceph-users] Re: Mix NVME's in a single cluster

2025-01-24 Thread Bruno Gomes Pessanha
ram: 768GB cpu: AMD EPYC 9634 84-Core On Fri, 24 Jan 2025 at 15:48, Anthony D'Atri wrote: > It’s difficult to fully answer your question with the information > provided. Notably, your networking setup and the RAM / CPU SKUs are > important inputs. > > Assuming that the hosts have or would have

[ceph-users] Re: Mix NVME's in a single cluster

2025-01-24 Thread Robert Sander
Hi, On 24.01.25 15:35, Bruno Gomes Pessanha wrote: I have a Ceph Reef cluster with 10 hosts with 16 nvme slots but only half occupied with 15TB (2400 KIOPS) drives. 80 drives in total. I want to add another 80 to fully populate the slots. The question: What would be the downside if I expand the

[ceph-users] Re: Mix NVME's in a single cluster

2025-01-24 Thread Anthony D'Atri
Well heck, you’re good to go unless these are converged with compute. 168 threads / 16 OSDs = ~10 threads per OSD, with some left over for OS, observability, etc. You’re more than good. Suggest using BIOS settings and TuneD to disable deep C-states, verify with `power top`. Increased cooling,

[ceph-users] Mix NVME's in a single cluster

2025-01-24 Thread Bruno Gomes Pessanha
I have a Ceph Reef cluster with 10 hosts with 16 nvme slots but only half occupied with 15TB (2400 KIOPS) drives. 80 drives in total. I want to add another 80 to fully populate the slots. The question: What would be the downside if I expand the cluster with 80 x 30TB (3300 KIOPS) drives? Thank you

[ceph-users] Re: Mix NVME's in a single cluster

2025-01-24 Thread Anthony D'Atri
It’s difficult to fully answer your question with the information provided. Notably, your networking setup and the RAM / CPU SKUs are important inputs. Assuming that the hosts have or would have sufficient CPU and RAM for the additional OSDs there wouldn’t necessarily be a downside, though you

[ceph-users] Re: RGW multisite metadata sync issue

2025-01-24 Thread Vahideh Alinouri
The metadata sync issue has been resolved by changing the master zone and re-running the metadata sync. On Mon, Dec 23, 2024 at 2:15 PM Vahideh Alinouri wrote: > When I increased the debug level of the RGW sync client to 20, I get it: > > 2024-12-23T09:42:17.248+ 7f124866b700 20 register_req

[ceph-users] Re: Error ENOENT: Module not found

2025-01-24 Thread Fnu Virender Kumar
Did you try Ceph mgr module enable orchestrator Ceph orch set backend Ceph orch ls Check the mgr service daemon as well Ceph -s Regards Virender From: Devender Singh Sent: Friday, January 24, 2025 6:34:43 PM To: ceph-users Subject: [ceph-users] Error ENOENT: M

[ceph-users] No recovery after removing node - active+undersized+degraded-- removed osd using purge...

2025-01-24 Thread Devender Singh
Hello all Urgent help needed. No recovery happening. Tried repairing pg and redeploy or create. Rebooted cluster but no luck.. data: volumes: 2/2 healthy pools: 18 pools, 817 pgs objects: 6.06M objects, 20 TiB usage: 30 TiB used, 302 TiB / 332 TiB avail pgs: 284

[ceph-users] Error ENOENT: Module not found

2025-01-24 Thread Devender Singh
Hello all Any quick fix for … root@sea-devnode1:~# ceph orch ls Error ENOENT: Module not found Regards Dev ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: unmatched rstat rbytes on single dirfrag

2025-01-24 Thread Frank Schilder
Hi Eugen, thanks for the fast response. My search did not find that blog, thanks for sending the link. Yes, our recent troubles have to do with forward scrub. Since nothing crashes I'm not sure if these errors are serious and/or fixed on the fly. I think we will wait with another forward scrub

[ceph-users] CephFS: EC pool with "leftover" objects

2025-01-24 Thread Robert Sander
Hi, there is an old cluster (9 years) that gets constantly upgraded and is currently running version 17.2.7. 3 years ago (when running version 16) a new EC pool was added to the existing CephFS to be used with the directory layout feature. Now it was decided to remove that pool again. On th

[ceph-users] FS design question around subvolumes vs dirs

2025-01-24 Thread Jesse Galley
Hello, We are building a ~2PiB 200 OSD cluster that will be used entirely by an FS for hosting data storage. I have a question regarding whether there is any specific reason/advantage to use subvolumes vs plain directories for segregating different departments, envs, etc. Currently, the plan is

[ceph-users] Re: Error ENOENT: Module not found

2025-01-24 Thread Devender Singh
Thanks for you reply… but those command not working as its an always module..but strange still showing error, # ceph mgr module enable orchestrator module 'orchestrator' is already enabled (always-on) # ceph orch set backend — returns successfully… # # ceph orch ls Error ENOENT: No orchestra

[ceph-users] Re: Watcher Issue

2025-01-24 Thread Devender Singh
Hello all Sorry for late reply.. I tried two things, 1. My cluster was using swap, I made swap off. 2. I started repair command on pool/image and seems it worked . But after that there was no command found to pause repair as it started deep scrubs too. How to unpause the repairs… Regard