Hi,
in general you need SSD with power loss protection (big capacitors)
Ceph uses sync to flush data to drive before confirming the write. SSDs
with PLP can lie a bit and confirm write faster because PLP should
guarantee that data in cache will be written to flash.
This post is about journal
Note that 'norebalance' disables the balancer but doesn't prevent
backfill; you'll want to set 'nobackfill' as well.
Josh
On Sun, Jan 12, 2025 at 1:49 PM Anthony D'Atri wrote:
>
> [ ed: snag during moderation (somehow a newline was interpolated in the
> Subject), so I’m sending this on behalf o
Hi Eugen,
No, as far as I can tell I only have one prometheus service running.
———
[root@cephman2 ~]# ceph orch ls prometheus --export
service_type: prometheus
service_name: prometheus
placement:
count: 1
label: _admin
[root@cephman2 ~]# ceph orch ps --daemon-type prometheus
NAME
Hi,
Quick update on this topic, seems to be the solution for us to offline compact
all osds.
After that all snaptrimming can finish in an hour rather than a day.
From: Szabo, Istvan (Agoda)
Sent: Friday, November 29, 2024 2:31:33 PM
To: Bandelow, Gunnar ; Ceph U
Dear all,
a quick update and some answers. We set up a dedicated host for running an MDS
and debugging the problem. On this host we have 750G RAM, 4T swap and 4T log,
both on fast SSDs. Plan is to monitor with "perf top" the MDS becoming the
designated MDS for the problematic rank and also pull
Thanks, Eugen! Just incase you have any more suggestions, this still isn’t
quite working for us.
Perhaps one clue is that in the Alerts view of the cephadm dashboard, every
alert is listed twice. We see two CephPGImbalance alerts, both set to 30%
after redeploying the service. If I then foll
Do you have two Prometheus instances? Maybe you could share
ceph orch ls prometheus --export
Or alternatively:
ceph orch ps --daemon-type prometheus
You can use two instances for HA, but then you need to change the
threshold for both, of course.
Zitat von "Devin A. Bougie" :
Thanks, Eugen!
Thanks for the input Josh. I actually started looking into this was because
we're adding some SSD OSDs to this cluster, and they were basically as slow on
their initial boot as HDD OSDs when the cluster hasn't trimmed OSDmaps in a
while.
I'd be interested to know if other people seeing this slo
Hello,
we are having issues with cephfs cluster.
Any help would be appreciated.
We are running still on 18.2.0.
During holidays we had outage caused by filling up rootfs. OSDs started
randomly dying and we had time when not all PGs were active.
This issue is already solved and all OSDs work fin
Ah, I checked on a newer test cluster (Squid) and now I see what you
mean. The alert is shown per OSD in the dashboard, if you open the
dropdown you see which daemons are affected. I think it works a bit
different in Pacific (that's what the customer is still running) when
I last had to mod
Hi Frank,
More than ever. You should open a tracker and post debug logs there so anyone
can have a look.
Regards,
Frédéric.
De : Frank Schilder
Envoyé : lundi 13 janvier 2025 17:39
À : ceph-users@ceph.io
Cc: Dan van der Ster; Patrick Donnelly; Bailey Allison; Sp
I'm not sure, but I think it has to do with network communication
between the OSDs? In any case, you probably can make it work with
selinux with the appropriate settings, but I wasn't using selinux
before, so disabling it was the easiest solution.
On Sat, Jan 11, 2025 at 1:24 PM Alvaro Soto wrote
12 matches
Mail list logo