[ceph-users] Re: Update host operating system - Ceph version 18.2.4 reef

2024-12-17 Thread Janne Johansson
> > > We have a cluster running with 6 Ubuntu 20.04 servers and we would like > > > to add another host but with Ubuntu 22.04, will we have any problems? > > > We would like to add new HOST with Ubuntu 22.04 and deactivate the Ubuntu > > > 20.04 ones, our idea would be to update the hosts from Ub

[ceph-users] ceph network acl: multiple network prefixes possible?

2024-12-17 Thread Stefan Kooman
Hi List, Is it possible to specify multiple network prefixes for a given ceph user? I.e. osd 'allow {access-spec} [{match-spec}] [network {network1/prefix} {network2/prefix} {network3/prefix}] ' osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]] [network {network1/prefix}

[ceph-users] Re: Random ephemeral pinning, what happens to sub-tree under pin root dir

2024-12-17 Thread Patrick Donnelly
On Fri, Dec 13, 2024 at 7:09 AM Frank Schilder wrote: > > Dear all, > > I have a question about random ephemeral pinning that I can't find an answer > in the docs to. Question first and some background later. Docs checked for > any version from octopus up to latest. Our version for applying rand

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Gregory Orange
On 18/12/24 02:30, Janek Bevendorff wrote: > I did increase the pgp_num of a pool a while back, totally > forgot about that. Due to the ongoing rebalancing it was stuck half way, > but now suddenly started up again. The current PG number of that pool is > not quite final yet, but definitely higher

[ceph-users] Re: squid 19.2.1 RC QE validation status

2024-12-17 Thread Laura Flores
Smoke approved: https://tracker.ceph.com/projects/rados/wiki/SQUID#httpstrackercephcomissues69234note-1 On Tue, Dec 17, 2024 at 4:28 PM Laura Flores wrote: > I reviewed rados here: > https://tracker.ceph.com/projects/rados/wiki/SQUID#httpstrackercephcomissues69234note-1 > > Conferring with Radek

[ceph-users] Re: squid 19.2.1 RC QE validation status

2024-12-17 Thread Laura Flores
I reviewed rados here: https://tracker.ceph.com/projects/rados/wiki/SQUID#httpstrackercephcomissues69234note-1 Conferring with Radek first, then will update the thread to approve. On Mon, Dec 16, 2024 at 11:27 AM Yuri Weinstein wrote: > Details of this release are summarized here: > > https://t

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Eugen Block
I would like to know more about those corner cases and why it’s not recommended to use this approach. Because our customers and we ourselves have been using such profiles for years, including multiple occasions when one of two DCs failed with k7m11. They were quite happy with the resiliency

[ceph-users] Re: stray host with daemons

2024-12-17 Thread Eugen Block
Could you please provide a step-by-step instruction? Otherwise it’s difficult to reproduce. I tried the same with set-hostname but that just leads to an „offline“ host. How exactly did you end up in the current situation? Zitat von zzx...@gmail.com: Hi, I used "hostnamectl set-hostname c

[ceph-users] Re: Erasure coding issue

2024-12-17 Thread Eugen Block
First, min_size=3 with k=3 is not a good idea. You don’t provide any details, so it’s difficult to give any reasonable explanation. I’ll go with a wrong crush rule that doesn’t have host as failure domain but OSD. Zitat von Deba Dey : See I have total host 5, each host holding 24 HDD and e

[ceph-users] cephadm problem with create hosts fqdn via spec

2024-12-17 Thread Piotr Pisz
Hi, We add hosts to the cluster using fqdn, manually (ceph orch host add) everything works fine. However, if we use the spec file as below, the whole thing falls apart. Ceph 18.2.4 --- service_type: host addr: xx.xx.xx.xx hostname: ceph001.xx002.xx.xx.xx.com location: root: xx002 rack: rack

[ceph-users] Re: Update host operating system - Ceph version 18.2.4 reef

2024-12-17 Thread Linas Vepstas
Hi, On Tue, Dec 17, 2024 at 12:48 PM Janne Johansson wrote: > > > Ceph version 18.2.4 reef (cephadm) > > Hello, > > We have a cluster running with 6 Ubuntu 20.04 servers and we would like to > > add another host but with Ubuntu 22.04, will we have any problems? > > We would like to add new HOST

[ceph-users] Experimental upgrade of a Cephadm-managed Squid cluster to Ubuntu Noble (walk-through and RFC)

2024-12-17 Thread Florian Haas
Hi everyone, as part of keeping our Ceph course updated, we recently went through the *experimental* process of upgrading a Cephadm-managed cluster from Ubuntu Jammy to Noble. Note that at this point there are no community-built Ceph packages that are available for Noble, though there *are* p

[ceph-users] Re: Update host operating system - Ceph version 18.2.4 reef

2024-12-17 Thread Janne Johansson
> Ceph version 18.2.4 reef (cephadm) > Hello, > We have a cluster running with 6 Ubuntu 20.04 servers and we would like to > add another host but with Ubuntu 22.04, will we have any problems? > We would like to add new HOST with Ubuntu 22.04 and deactivate the Ubuntu > 20.04 ones, our idea would

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
I think it was mentioned elsewhere in this thread that there are limitations to what upmap can do, especially in significant crush map change situations. It can't violate crush rules (mon-enforced), and if the same OSD shows up multiple times in a backfill then upmap can't deal with it. The numb

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Joshua Baergen
I think it was mentioned elsewhere in this thread that there are limitations to what upmap can do, especially in significant crush map change situations. It can't violate crush rules (mon-enforced), and if the same OSD shows up multiple times in a backfill then upmap can't deal with it. Creeping b

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
Something's not quite right yet. I got the remapped PGs down from > 4000 to around 1300, but there it stops. When I restart the process, I can get it down to around 280, but there it stops and creeps back up afterwards. I have a bunch of these messages in the output: WARNING: pg 100.3d53: conf

[ceph-users] Tracing Ceph with LTTng-UST issue

2024-12-17 Thread IslamChakib Kedadsa
Hello, We are writing to you regarding an issue we encountered while attempting to trace Ceph with LTTng-UST. Below are the steps we have followed so far: 1. *Compiling Ceph with LTTng Support*: - We modified the debian/rules file to enable LTTng support using the following flags:

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Joshua Baergen
Hey Janek, Ah, yes, we ran into that invalid json output in https://github.com/digitalocean/ceph_exporter as well. I have a patch I wrote for ceph_exporter that I can port over to pgremapper (that does similar to what your patch does). Josh On Tue, Dec 17, 2024 at 9:38 AM Janek Bevendorff wrote

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
Looks like there is something wrong with the .mgr pool. All other have proper values. For now I've patched the pgremapper source code to replace the inf values with 0 before unmarshaling the JSON. That at least made the tool work. I guess it's safe to just delete that pool and let the MGRs recr

[ceph-users] [Cephadm] Bootstrap Ceph with alternative data directory

2024-12-17 Thread Jinfeng Biao
We’d like to bootstrap and deploy Ceph into alternative directory. By keeping Ceph relevant data to a separate directory mounted from a separate disk, Ceph services will start right after operating system reprovision. While deploying with –data-dir and other directory parameters, cephadm --data

[ceph-users] Re: Squid Manager Daemon: balancer crashing orchestrator and dashboard

2024-12-17 Thread Laimis Juzeliūnas
Hello Ceph community, First of all - big thanks to Laura for bringing a fix to this. Can't wait to try it out! While we're waiting for the 19.2.1 to arrive I've just wanted to ask if anyone knows any neat alternatives for the built-in Ceph balancer? We're keeping it off for now and manually set

[ceph-users] Erasure coding issue

2024-12-17 Thread Deba Dey
See I have total host 5, each host holding 24 HDD and each HDD is of size 9.1TiB. So, a total of 1.2PiB out of which i am getting 700TiB. I did erasure coding 3+2 and placement group 128. But, the issue i am facing is when I turn off one node write is completely disabled. Erasure coding 3+2 can han

[ceph-users] mount path missing for subvolume

2024-12-17 Thread bruno . pessanha
I'm using: ``` # ceph version ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable) ``` I get the following error when trying to run: # ceph fs subvolume info volume01 vol01 group01 Error ENOENT: mount path missing for subvolume 'vol01' The volume exists and it's mounted on

[ceph-users] Update host operating system - Ceph version 18.2.4 reef

2024-12-17 Thread alessandro
Ceph version 18.2.4 reef (cephadm) Hello, We have a cluster running with 6 Ubuntu 20.04 servers and we would like to add another host but with Ubuntu 22.04, will we have any problems? We would like to add new HOST with Ubuntu 22.04 and deactivate the Ubuntu 20.04 ones, our idea would be to upda

[ceph-users] Re: stray host with daemons

2024-12-17 Thread zzxtty
Hi, I used "hostnamectl set-hostname cephp15". There has been a reboot since. Cheers ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Update host operating system - Ceph version 18.2.4 reef

2024-12-17 Thread alessandro
Ceph version 18.2.4 reef (cephadm) Hello, We have a cluster running with 6 Ubuntu 20.04 servers and we would like to add another host but with Ubuntu 22.04, will we have any problems? We would like to add new HOST with Ubuntu 22.04 and deactivate the Ubuntu 20.04 ones, our idea would be to upda

[ceph-users] OSD_FULL after OSD Node Failures

2024-12-17 Thread Gerard Hand
Hi, We recently had problems that meant 3 out of 32 OSD hosts went offline for about 10 minutes. The hosts are now back in the cluster as expected and backfilling going on. However we are seeing a couple of problems. We are seeing: 1. Ceph is flagging a handfull of PGs as backfill_toofull wh

[ceph-users] Re: Squid: deep scrub issues

2024-12-17 Thread Laimis Juzeliūnas
Hi all, Just came back from this years Cephalocon and managed to get a quick chat with Ronen regarding this issue. He had a great presentation[1, 2] on the upcoming changes to scrubbing in Tentacle as well as some changes already made in Squid release. The primary suspect here is the mclock sch

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
I checked the ceph osd dump json-pretty output and validated it with a little Python script. Turns out, there's this somewhere around line 1200:     "read_balance": {     "score_acting": inf,     "score_stable": inf,     "optimal_score": 0,    

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
Thanks. I tried running the command (dry run for now), but something's not working as expected. Have you ever seen this? $ /root/go/bin/pgremapper cancel-backfill --verbose ** executing: ceph osd dump -f json panic: invalid character 'i' looking for beginning of value goroutine 1 [running]: mai

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janne Johansson
> You can use pg-remapper (https://github.com/digitalocean/pgremapper) or > similar tools to cancel the remapping; up-map entries will be created > that reflect the current state of the cluster. After all currently > running backfills are finished your mons should not be blocked anymore. > I would

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Anthony D'Atri
Just repeating what I read. I suspect that the effect is minimal. Back when I did ZFS a lot there was conventional wisdom of a given party group not having more than 9 drives, to keep rebuild and writes semi-manageable. >> A few years back someone asserted that EC values with small prime factor

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Janne Johansson
> > To be honest with 3:8 we could protect the cluster more from osd flapping. > > Let's say you have less chance to have 8 down pgs on 8 separate nodes then > > with 8:3 only 3pgs on 3 nodes. > > Of course this comes with the cost on storage used. > > Is there any disadvantage performance wise on

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Anthony D'Atri
> To be honest with 3:8 we could protect the cluster more from osd flapping. > Let's say you have less chance to have 8 down pgs on 8 separate nodes then > with 8:3 only 3pgs on 3 nodes. > Of course this comes with the cost on storage used. > Is there any disadvantage performance wise on this? A

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Janek Bevendorff
Thanks for your replies! You can use pg-remapper (https://github.com/digitalocean/pgremapper) or similar tools to cancel the remapping; up-map entries will be created that reflect the current state of the cluster. After all currently running backfills are finished your mons should not be bloc

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Wesley Dillingham
Agree with pg-remapper or upmap-remapped approach. One thing to be aware of though is that the Mons will invalidate any upmap which breaks the data placement rules. So for instance if you are moving from host based failure domain to rack based failure domain attempting to upmap the data back to its

[ceph-users] Re: MONs not trimming

2024-12-17 Thread Burkhard Linke
Hi, On 17.12.24 14:40, Janek Bevendorff wrote: Hi all, We moved our Ceph cluster to a new data centre about three months ago, which completely changed its physical topology. I changed the CRUSH map accordingly so that the CRUSH location matches the physical location again and the cluster has

[ceph-users] MONs not trimming

2024-12-17 Thread Janek Bevendorff
Hi all, We moved our Ceph cluster to a new data centre about three months ago, which completely changed its physical topology. I changed the CRUSH map accordingly so that the CRUSH location matches the physical location again and the cluster has been rebalancing ever since. Due to capacity li

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Szabo, Istvan (Agoda)
To be honest with 3:8 we could protect the cluster more from osd flapping. Let's say you have less chance to have 8 down pgs on 8 separate nodes then with 8:3 only 3pgs on 3 nodes. Of course this comes with the cost on storage used. Is there any disadvantage performance wise on this? Istvan _

[ceph-users] Re: Upgrade stalled after upgrading managers

2024-12-17 Thread Torkil Svensgaard
Thanks guys, turning off the balancer seems to have fixed it. Mvh. Torkil On 17/12/2024 12:40, Eugen Block wrote: I know 19.2.1 is already in the validation phase, but it would make sense (to me) to add this to the upgrade notes for Squid (https:// docs.ceph.com/en/latest/releases/squid/#v19-

[ceph-users] Re: Upgrade stalled after upgrading managers

2024-12-17 Thread Eugen Block
I know 19.2.1 is already in the validation phase, but it would make sense (to me) to add this to the upgrade notes for Squid (https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid) until the fix has been released. Similar to the note about ISCSI users. Adding Zac here directly. Zi

[ceph-users] Re: Upgrade stalled after upgrading managers

2024-12-17 Thread Laimis Juzeliūnas
Hi Torkil, Possible that you are hitting balancer issues on 19.2.0 for clusters with larger pg numbers: https://tracker.ceph.com/issues/68657 Try turning it off with ceph balancer off Best, Laimis J. > On 17 Dec 2024, at 13:15, Torkil Svensgaard wrote: > > > > On 17/12/2024 12:05, Torkil Sv

[ceph-users] Re: Upgrade stalled after upgrading managers

2024-12-17 Thread Torkil Svensgaard
On 17/12/2024 12:05, Torkil Svensgaard wrote: Hi Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the managers but no further progress. Now it actually seems to have upgraded 1 MON now then the orchestrator crashed again: " { "mon": { "ceph version 18.2.4 (e7ad5345525

[ceph-users] Upgrade stalled after upgrading managers

2024-12-17 Thread Torkil Svensgaard
Hi Running upgrade from 18.2.4 to 19.2.0 and it managed to upgrade the managers but no further progress. If I fail over the mgr it goes: " [root@ceph-flash1 ~]# ceph orch upgrade status Error ENOTSUP: Module 'orchestrator' is not enabled/loaded (required by command 'orch upgrade status'): us

[ceph-users] Re: Correct way to replace working OSD disk keeping the same OSD ID

2024-12-17 Thread Nicola Mori
I replaced another disk, this time everything worked as expected following this procedure: 1) Drain and destroy the OSD: ceph orch osd rm --replace 2) Replace the disk. 3) Zap the new disk: ceph orch device zap /dev/sd --force 4) Manually create the new OSD: ceph orch daemon add

[ceph-users] Re: Erasure coding best practice

2024-12-17 Thread Alexander Patrakov
Hello Szabo, Some of these "weird" erasure coding setups come from old-style stretch clusters where the cluster was designed to withstand the loss of one datacenter out of two. For example, a 2+4 EC setup could be used together with a rule that selects three hosts from one datacenter and three fr

[ceph-users] Re: RGW multisite metadata sync issue

2024-12-17 Thread Vahideh Alinouri
I also see this in the output of radosgw-admin metadata sync status. I think it's strange because there should be a marker to follow the sync. { "key": 0, "val": { "state": 0, "marker": "", "next