[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Dhairya Parmar
Hi, The log you shared indicates that MDS is waiting for the latest OSDMap epoch. The epoch number in log line 123138 is the epoch of last failure. Any MDS entering replay state needs at least this osdmap epoch to ensure the blocklist propopates. If the epoch is less than this then it just goes ba

[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Alexey.Tsivinsky
Thanks for your answer! Current status of our cluster cluster: id: c3d33e01-dfcd-4b39-8614-993370672504 health: HEALTH_WARN 1 failed cephadm daemon(s) 1 filesystem is degraded services: mon: 3 daemons, quorum cmon1,cmon2,cmon3 (age 15h) mgr: cmon3.

[ceph-users] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon failed

2024-11-25 Thread Stolte, Felix
Hi folks, we did upgrade one of our clusters from pacific to Quincy. Everything worked fine, but cephadm complains about one osd not being upgraded: [WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1 failed. Upgrade daemon: osd.15: cephadm exited with an error code:

[ceph-users] down OSDs, Bluestore out of space, unable to restart

2024-11-25 Thread John Jasen
Ceph version 17.2.6 After a power loss event affecting my ceph cluster, I've been putting humpty dumpty back together since. One problem I face is that with objects degraded, rebalancing doesn't run -- and this resulted in several of my fast OSDs filling up. I have 8 OSDs currently down, 100% fu

[ceph-users] Re: Encrypt OSDs on running System. A good Idea?

2024-11-25 Thread Giovanna Ratini
Hello Janne, thank you for your answer. I will do. Best Gio Am 20.11.2024 um 11:24 schrieb Janne Johansson: What issues should I expect if I take an OSD (15TB) out one at a time, encrypt it, and put it back into the cluster? I would have a long period where some OSDs are encrypted and others

[ceph-users] Re: Ceph OSD perf metrics missing

2024-11-25 Thread Kilian Ries
Prometheus metrics seem to be broken, too: ceph_osd_op_r_latency_sum ceph_osd_op_w_latency_sum Both of them for example are not reported by the ceph mgr metrics exporter: curl http://192.168.XXX.XXX:9283/metrics |grep ceph_osd_ I get some merics like "ceph_osd_commit_latency_ms" or "ceph

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-25 Thread Stefan Kooman
On 18-10-2024 01:32, Dan van der Ster wrote: - Experienced Ceph User? Participate in the Power Users afternoon session at the Developers Summit - https://indico.cern.ch/e/ceph-developer-summit https://indico.cern.ch/event/1417034/ gives me: Update: The Ceph Developer Summit is nearing capacity

[ceph-users] Re: Ceph OSD perf metrics missing

2024-11-25 Thread Pierre Riteau
Hello Kilian, I am not entirely sure this is the same issue, but I know Prometheus doesn’t export some performance metrics by default anymore in Reef. This can be worked around by setting: ceph config set mgr mgr/prometheus/exclude_perf_counters false A proper fix would be to switch to using a p

[ceph-users] Re: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon failed

2024-11-25 Thread Frédéric Nass
Hi Felix. Could be a systemd bug prior to v247. Ceph code [1] points to this tracker [2]. Can you check if information in PR [3] helps you to get out of this trouble? Regards, Frédéric. [1] https://raw.githubusercontent.com/ceph/ceph/pacific/src/cephadm/cephadm [2] https://tracker.ceph.com/issu

[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Dhairya Parmar
On Mon, Nov 25, 2024 at 3:33 PM wrote: > Thanks for your answer! > > > Current status of our cluster > > cluster: > id: c3d33e01-dfcd-4b39-8614-993370672504 > health: HEALTH_WARN > 1 failed cephadm daemon(s) > 1 filesystem is degraded > > services: > mon:

[ceph-users] Re: Ceph OSD perf metrics missing

2024-11-25 Thread Sake Ceph
I stumbled on this problem earlier, port 9926 isn't being opened. See also thread "Grafana dashboards is missing data". A tracker is already opened to fix the issue: https://tracker.ceph.com/issues/67975 > Op 25-11-2024 13:44 CET schreef Kilian Ries : > > > Prometheus metrics seem to be broke

[ceph-users] Re: Ceph OSD perf metrics missing

2024-11-25 Thread Kilian Ries
Any ideas? Still facing the problem ... Von: Kilian Ries Gesendet: Mittwoch, 23. Oktober 2024 13:59:06 An: ceph-users@ceph.io Betreff: Ceph OSD perf metrics missing Hi, i'm running a Ceph v18.2.4 cluster. I'm trying to build some latency monitoring with the

[ceph-users] Re: CephFS 16.2.10 problem

2024-11-25 Thread Alexey.Tsivinsky
We have a version in containers. Here is a fresh log from the first mds. Actually, this is the whole log, after it is restarted. debug 2024-11-25T11:15:13.405+ 7f2b0c4eb900 0 set uid:gid to 167:167 (ceph:ceph) debug 2024-11-25T11:15:13.405+ 7f2b0c4eb900 0 ceph version 16.2.10 (45fa1a

[ceph-users] Ceph Steering Committee 2024-11-25

2024-11-25 Thread Gregory Farnum
Another light meeting (we're appreciating it after our heavy governance discussions!): * Cancel next week due to Cephalocon travel * We were blocked on the quincy release due to some build issues with Ganesha (apparently we were pointed at the wrong kind of CentOS repo, and then the team was asking

[ceph-users] 4k IOPS: miserable performance in All-SSD cluster

2024-11-25 Thread Martin Gerhard Loschwitz
Folks, I am getting somewhat desperate debugging multiple setups here within the same environment. Three clusters, two SSD-only, one HDD-only, and what they all have in common is abysmal 4k IOPS performance when measuring with „rados bench“. Abysmal means: In an All-SSD cluster I will get rough

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-25 Thread Alex Gorbachev
Hi Martin, This is a bit of generic recommendation, but I would go down the path of reducing complexity, i.e. first test the drive locally on the OSD node and see if there's anything going on with e.g. drive firmware, cables, HBA, power. Then do fio from another host, and this would incorporate n

[ceph-users] v17.2.8 Quincy released

2024-11-25 Thread Yuri Weinstein
We're happy to announce the 8th backport release in the Quincy series. https://ceph.io/en/news/blog/2024/v17-2-8-quincy-released/ v17.2.8 will have RPM/centos 9 packages instead of RPM/centos 8 built. v17.2.8 container images, now based on CentOS 9, may be incompatible on older kernels (e.g., U

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-25 Thread Anthony D'Atri
Good insights from Alex. Are these clusters all new? Or have they been around a while, previously happier? One idea that comes to mind is an MTU mismatch between hosts and switches, or some manner of bonding misalignment. What does `netstat -I` show? `ethtool -S`? I’m thinking that maybe