Hi,
The log you shared indicates that MDS is waiting for the latest OSDMap
epoch. The epoch number in log line 123138 is the epoch of last failure.
Any MDS entering replay state needs at least this osdmap epoch to ensure
the blocklist propopates. If the epoch is less than this then it just goes
ba
Thanks for your answer!
Current status of our cluster
cluster:
id: c3d33e01-dfcd-4b39-8614-993370672504
health: HEALTH_WARN
1 failed cephadm daemon(s)
1 filesystem is degraded
services:
mon: 3 daemons, quorum cmon1,cmon2,cmon3 (age 15h)
mgr: cmon3.
Hi folks,
we did upgrade one of our clusters from pacific to Quincy. Everything worked
fine, but cephadm complains about one osd not being upgraded:
[WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1
failed.
Upgrade daemon: osd.15: cephadm exited with an error code:
Ceph version 17.2.6
After a power loss event affecting my ceph cluster, I've been putting
humpty dumpty back together since.
One problem I face is that with objects degraded, rebalancing doesn't run
-- and this resulted in several of my fast OSDs filling up.
I have 8 OSDs currently down, 100% fu
Hello Janne,
thank you for your answer. I will do.
Best
Gio
Am 20.11.2024 um 11:24 schrieb Janne Johansson:
What issues should I expect if I take an OSD (15TB) out one at a time,
encrypt it, and put it back into the cluster? I would have a long period
where some OSDs are encrypted and others
Prometheus metrics seem to be broken, too:
ceph_osd_op_r_latency_sum
ceph_osd_op_w_latency_sum
Both of them for example are not reported by the ceph mgr metrics exporter:
curl http://192.168.XXX.XXX:9283/metrics |grep ceph_osd_
I get some merics like "ceph_osd_commit_latency_ms" or
"ceph
On 18-10-2024 01:32, Dan van der Ster wrote:
- Experienced Ceph User? Participate in the Power Users afternoon
session at the Developers Summit -
https://indico.cern.ch/e/ceph-developer-summit
https://indico.cern.ch/event/1417034/ gives me:
Update: The Ceph Developer Summit is nearing capacity
Hello Kilian,
I am not entirely sure this is the same issue, but I know Prometheus
doesn’t export some performance metrics by default anymore in Reef. This
can be worked around by setting:
ceph config set mgr mgr/prometheus/exclude_perf_counters false
A proper fix would be to switch to using a p
Hi Felix.
Could be a systemd bug prior to v247. Ceph code [1] points to this tracker [2].
Can you check if information in PR [3] helps you to get out of this trouble?
Regards,
Frédéric.
[1] https://raw.githubusercontent.com/ceph/ceph/pacific/src/cephadm/cephadm
[2] https://tracker.ceph.com/issu
On Mon, Nov 25, 2024 at 3:33 PM
wrote:
> Thanks for your answer!
>
>
> Current status of our cluster
>
> cluster:
> id: c3d33e01-dfcd-4b39-8614-993370672504
> health: HEALTH_WARN
> 1 failed cephadm daemon(s)
> 1 filesystem is degraded
>
> services:
> mon:
I stumbled on this problem earlier, port 9926 isn't being opened. See also
thread "Grafana dashboards is missing data".
A tracker is already opened to fix the issue:
https://tracker.ceph.com/issues/67975
> Op 25-11-2024 13:44 CET schreef Kilian Ries :
>
>
> Prometheus metrics seem to be broke
Any ideas? Still facing the problem ...
Von: Kilian Ries
Gesendet: Mittwoch, 23. Oktober 2024 13:59:06
An: ceph-users@ceph.io
Betreff: Ceph OSD perf metrics missing
Hi,
i'm running a Ceph v18.2.4 cluster. I'm trying to build some latency monitoring
with the
We have a version in containers.
Here is a fresh log from the first mds. Actually, this is the whole log, after
it is restarted.
debug 2024-11-25T11:15:13.405+ 7f2b0c4eb900 0 set uid:gid to 167:167
(ceph:ceph)
debug 2024-11-25T11:15:13.405+ 7f2b0c4eb900 0 ceph version 16.2.10
(45fa1a
Another light meeting (we're appreciating it after our heavy governance
discussions!):
* Cancel next week due to Cephalocon travel
* We were blocked on the quincy release due to some build issues with
Ganesha (apparently we were pointed at the wrong kind of CentOS repo, and
then the team was asking
Folks,
I am getting somewhat desperate debugging multiple setups here within the same
environment. Three clusters, two SSD-only, one HDD-only, and what they all have
in common is abysmal 4k IOPS performance when measuring with „rados bench“.
Abysmal means: In an All-SSD cluster I will get rough
Hi Martin,
This is a bit of generic recommendation, but I would go down the path of
reducing complexity, i.e. first test the drive locally on the OSD node and
see if there's anything going on with e.g. drive firmware, cables, HBA,
power.
Then do fio from another host, and this would incorporate n
We're happy to announce the 8th backport release in the Quincy series.
https://ceph.io/en/news/blog/2024/v17-2-8-quincy-released/
v17.2.8 will have RPM/centos 9 packages instead of RPM/centos 8 built.
v17.2.8 container images, now based on CentOS 9, may be incompatible
on older kernels (e.g., U
Good insights from Alex.
Are these clusters all new? Or have they been around a while, previously
happier?
One idea that comes to mind is an MTU mismatch between hosts and switches, or
some manner of bonding misalignment. What does `netstat -I` show? `ethtool
-S`? I’m thinking that maybe
18 matches
Mail list logo