[ceph-users] Re: CPU requirements

2024-09-19 Thread Anthony D'Atri
> Thank you for your explanations and references. I will check them all. In the > meantime it turned out that the disks for Ceph will come from SAN Be prepared for the additional latency and amplified network traffic. > Probably in this case the per OSD CPU cores can be lowered to 2 CPU/OSD. B

[ceph-users] Strange CephFS Permission error

2024-09-19 Thread Carsten Feuls
Hello, a cluster from my friend has an strage issue with CephFS Permission. Ceph Version is 17.2.7 in cephadm. The Cluster was much older and was upgraded over years from 12 and moved to new hosts. When I create a cephfs user for my private cluster with "ceph fs authorize" I get permission like

[ceph-users] Re: [EXT] Re: mclock scheduler kills clients IOs

2024-09-19 Thread Justin Mammarella
We’re running Quincy 17.2.7 here, And we see the iops benchmark performed on osd start: 2024-09-20T09:57:26.265+1000 7facbdc64540 1 osd.196 2879010 maybe_override_max_osd_capacity_for_qos osd bench result - bandwidth (MiB/sec): 4.369 iops: 1118.445 elapsed_sec: 2.682 2024-09-20T09:57:26.265+100

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Joshua Baergen
Ah, yes, that's a good point - if there's backfill going on then buildup like this can happen. On Thu, Sep 19, 2024 at 10:08 AM Konstantin Shalygin wrote: > > Hi, > > On 19 Sep 2024, at 18:26, Joshua Baergen wrote: > > Whenever we've seen osdmaps not being trimmed, we've made sure that > any dow

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Konstantin Shalygin
Hi, > On 19 Sep 2024, at 18:26, Joshua Baergen wrote: > > Whenever we've seen osdmaps not being trimmed, we've made sure that > any down OSDs are out+destroyed, and then have rolled a restart > through the mons. As of recent Pacific at least this seems to have > reliably gotten us out of this si

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Joshua Baergen
Whenever we've seen osdmaps not being trimmed, we've made sure that any down OSDs are out+destroyed, and then have rolled a restart through the mons. As of recent Pacific at least this seems to have reliably gotten us out of this situation. Josh On Thu, Sep 19, 2024 at 9:14 AM Igor Fedotov wrote

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Igor Fedotov
Here it goes beyond of my expertise. I saw unbounded osdmap epoch growth for two completely different cases. And unable to say what's causing it this time. But IMO you shouldn't do any osdmap trimming yourself - that could likely result in an unpredictable behavior. So I'd encourage you to fi

[ceph-users] Re: Radosgw bucket check fix doesn't do anything

2024-09-19 Thread Reid Guyett
Hi, I didn't notice any changes in the counts after running the check --fix | check --check-objects --fix. Also the bucket isn't versioned. I will take a look at the index vs the radoslist. Which side would cause the 'invalid_multipart_entries"? Thanks On Thu, Sep 19, 2024 at 5:50 AM Frédéric N

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Александр Руденко
Igor, thanks, very helpful. Our current osdmap weighs 1.4MB. And it changes all calculations.. Looks like it can be our case. I think we have this situation due to long backfilling which takes place now and going for the last 3 weeks. Can we drop some amount of osdmaps before rebalance completes

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Igor Fedotov
please see my comments inline. On 9/19/2024 1:53 PM, Александр Руденко wrote: Igor, thanks! > What are the numbers today? Today we have the same "oldest_map": 2408326 and "newest_map": 2637838, *+2191*. ceph-objectstore-tool --op meta-list --data-path /var/lib/ceph/osd/ceph-70 | grep osdm

[ceph-users] VFS: Busy inodes after unmount of ceph lead to kernel panic (maybe?)

2024-09-19 Thread Christian Kugler
Hi, we had a problem with a Ceph OSD stuck in snaptrim. We set the Ceph OSD down temporarily to potentially solve the issue. This ended up kernel panicking two of our Kubernetes workers that use this Ceph cluster for RBD and CephFS directly and via CSI driver. The more notable lines were these af

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Konstantin Shalygin
Hi, > On 19 Sep 2024, at 12:33, Igor Fedotov wrote: > > osd_target_transaction_size should control that. > > I've heard of it being raized to 150 with no obvious issues. Going beyond is > at your own risk. So I'd suggest to apply incremental increase if needed. Thanks! Now much better k __

[ceph-users] Re: [EXTERNAL] Deploy rgw different version using cephadm

2024-09-19 Thread Alex Hussein-Kershaw (HE/HIM)
I think the advice is not to use floating tags (i.e. "latest") and use specific tags if possible. I believe you can achieve what you want with either: "ceph orch upgrade --image " not sure if this allows you to downgrade, but certainly lets you upgrade and change image, see Upgrading Ceph — Cep

[ceph-users] Re: mclock scheduler kills clients IOs

2024-09-19 Thread Andrej Filipcic
Hi, the problem comes from older ceph releases. In our case, hdd iops were benchmarked in the range of 250 to 4000, which clearly makes no sense. At osd startup, the benchmark is skipped if that value is already in ceph config, so these initial benchmark values were never changed. To reset th

[ceph-users] Re: Radosgw bucket check fix doesn't do anything

2024-09-19 Thread Frédéric Nass
Hi Reid, I see. It seems weird that the --fix command output shows no differences between existing_header and calculated_header after it cleaned up some index entries (removing manifest part from index). Have you tried running the stats command again to see if any figures were updated? Based

[ceph-users] Deploy rgw different version using cephadm

2024-09-19 Thread Mahdi Noorbala
Hello Recently I deployed a ceph cluster (version: reef) in my lab and after that, I deployed RGW using this manifest: service_type: rgw service_id: lab-object-storage placement: label: rgw count_per_host: 1 spec: rgw_frontend_port: 8080 Now I have a rgw container. the docker image is: quay

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Александр Руденко
Igor, thanks! > What are the numbers today? Today we have the same "oldest_map": 2408326 and "newest_map": 2637838, *+2191*. ceph-objectstore-tool --op meta-list --data-path /var/lib/ceph/osd/ceph-70 | grep osdmap | wc -l 458994 Can you clarify this, please: > and then multiply by amount of OS

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Igor Fedotov
Hi Konstantin, osd_target_transaction_size should control that. I've heard of it being raized to 150 with no obvious issues. Going beyond is at your own risk. So I'd suggest to apply incremental increase if needed. Thanks, Igor On 9/19/2024 10:44 AM, Konstantin Shalygin wrote: Hi Igor,

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Igor Fedotov
Hi Alexander, so newwest_map looks slowly growing. And (which is worse) oldest_map is constant. Which means no old map pruning is happening and more and more maps are coming. What are the numbers today? You can assess the number of objects in "meta" pool (that's where osdmaps are kept) for

[ceph-users] Re: Radosgw bucket check fix doesn't do anything

2024-09-19 Thread Frédéric Nass
Oh, by the way, since 35470 is near two times 18k, couldn't it be that the source bucket is versioned and the destination bucket only got the most recent copy of each object? Regards, Frédéric. - Le 18 Sep 24, à 20:39, Reid Guyett a écrit : > Hi Frederic, > Thanks for those notes. >

[ceph-users] Re: mclock scheduler kills clients IOs

2024-09-19 Thread Daniel Schreiber
Hi Denis, we observed the same behaviour here. The cause was that the number of iops discovered at OSD startup was way too high. In our setup the rocksdb is on flash. When I set osd_mclock_max_capacity_iops_hdd to a value that the HDDs could handle, the situation was resolved, clients got th

[ceph-users] Re: [External Email] Overlapping Roots - How to Fix?

2024-09-19 Thread Stefan Kooman
On 19-09-2024 05:10, Anthony D'Atri wrote: Anthony, So it sounds like I need to make a new crush rule for replicated pools that specifies default-hdd and the device class? (Or should I go the other way around? I think I'd rather change the replicated pools even though there's more of th

[ceph-users] Re: High usage (DATA column) on dedicated for OMAP only OSDs

2024-09-19 Thread Konstantin Shalygin
Hi Igor, > On 18 Sep 2024, at 18:22, Igor Fedotov wrote: > > I recall a couple of cases when permanent osdmap epoch growth has been > filling OSD with relevant osd map info. Which could be tricky to catch. > > Please run 'ceph tell osd.N status" for a couple of affected OSDs twice > within e.