Hi Nicola,
Maybe 'config diff' can be of use to you
ceph tell osd.2 config diff
It should tell you every value that is not 'default' and where the value(s)
came from (File, mon, override).
Wout
-Oorspronkelijk bericht-
Van: Nicola Mori
Verzonden: Wednesday, 5 October 2022 12:33
Aan:
Hi Everyone,
We have a cluster that of which the manager is not working nicely. The mgrs are
all very slow to respond. This initially caused them to continuously fail over.
We've disabled most of the modules.
We’ve set the following which seemed to improve the situation a little bit but
the p
The issue is found and fixed in 15.2.3.
Thanks for your response Igor!
Kind regards,
Wout
42on
From: Wout van Heeswijk
Sent: Friday, 26 February 2021 16:10
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Nautilus Cluster Struggling to Come Back Online
For those interested in this issue. We've been seeing OSDs with corrupted wals
after they had a suicide time out. I've updated the ticket created by William
with some of our logs.
https://tracker.ceph.com/issues/48827#note-16
We're using ceph 15.2.2 in this cluster. Currently we are contemplatin
We are experiencing some issues with the bucket resharding queue in Ceph Mimic
at one of our customers. I suspect that some of the issues are related to the
upgrades of earlier versions of the cluster/radosgw.
1) When we cancel the resharding of a bucket, the bucket resharding entry is
removed
Hi Igor,
Are you referring to the bug reports:
- https://tracker.ceph.com/issues/48276 | OSD Crash with
ceph_assert(is_valid_io(off, len))
- https://tracker.ceph.com/issues/46800 | Octopus OSD died and fails to start
with FAILED ceph_assert(is_valid_io(off, len))
If that is the case, do you th
Hi Marcel,
The peering process is the process used by Ceph OSDs, on a per placement group
basis, to agree on the state of that placement on each of the involved OSDs.
In your case, 2/3 of the placement group metadata that needs to be agreed
upon/checked is on the nodes that did not undergo main
Hi Matt,
Looks like you are on the right track.
Kind regards,
Wout
42on
From: Matt Larson
Sent: Tuesday, September 22, 2020 5:44 AM
To: Wout van Heeswijk
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Troubleshooting stuck unclean PGs?
I tried this
Hi Matt,
The mon data can grow during when PGs are stuck unclean. Don't restart the mons.
You need to find out why your placement groups are "backfill_wait". Likely some
of your OSDs are (near)full.
If you have space elsewhere you can use the ceph balancer module or reweighting
of OSDs to reba
Hi Rene,
Yes, cephfs is a good filesystem for concurrent writing. When using CephFS with
ganesha you can even scale out.
It will perform better but why don't you mount CephFS inside the VM?
Kind regards,
Wout
42on
From: René Bartsch
Sent: Monday, Sept
Just to expand on the answer of Robert.
If all devices are of the same class (hdd/ssd/nvme) then a one on one
relationship is most likely the best choice.
If you have very fast devices it might be good the have multiple OSDs on one
devices, at the cost of some complexity.
If you have devices o
te-5
kind regards,
Wout
42on
On 2020-07-06 14:20, Wout van Heeswijk wrote:
An update about the progression of this issue:
After a few hours of normal operation the problem is now back in full
swing.
About ten, this time different osd's, have started crashing on
segfaults again.
kind regar
An update about the progression of this issue:
After a few hours of normal operation the problem is now back in full swing.
About ten, this time different osd's, have started crashing on segfaults
again.
kind regards,
Wout
42on
On 2020-07-06 09:23, Wout van Heeswijk wrote:
Hi Dan,
oads of memory.
(see ceph config set osd bluestore_fsck_quick_fix_on_mount false in
the release notes).
-- Dan
On Sun, Jul 5, 2020 at 2:43 PM Wout van Heeswijk wrote:
Hi All,
A customer of ours has upgraded the cluster from nautilus to octopus
after experiencing issues with osds not being able to connect to each
ot
t; /var/log/kern.log
root@st0:~#
kind regards,
Wout
42on
On 2020-07-05 14:45, Lindsay Mathieson wrote:
On 5/07/2020 10:43 pm, Wout van Heeswijk wrote:
After unsetting the norecover and nobackfill flag some OSDs started
crashing every few minutes. The OSD log, even with high debug
settings,
Hi All,
A customer of ours has upgraded the cluster from nautilus to octopus
after experiencing issues with osds not being able to connect to each
other, clients/mons/mgrs. The connectivity issues was related to the
msgrV2 and require_osd_release setting not being set to nautilus. After
fixin
Hi All,
I really like the idea of warning users against using unsafe practices.
Wouldn't it make sense to warn against using min_size=1 instead of size=1.
I've seen data loss happen with size=2 min_size=1 when multiple failures
occur and write have been done between the failures. Effectively t
Hi Francois,
Have you already looked at the option "osd_delete_sleep"? It will not
speed up the process but I will give you some control over your cluster
performance.
Something like:
ceph tell osd.\* injectargs '--osd_delete_sleep1'
kind regards,
Wout
42on
On 25-06-2020 09:57, Francois L
Hi Kári,
The backfilling process will prioritize those backfill request that are
for degrade pgs or undersized pgs:
"
The next priority is backfill of degraded PGs and is a function of the
degradation. A backfill for a PG missing two replicas will have a
priority higher than a backfill
Hi Harald,
Your cluster has a lot of objects per osd/pg and the pg logs will grow
fast and large because of this. The pg_logs will keep growing as long as
you're clusters pgs are not active+clean. This means you are now in a
loop where you cannot get stable running OSDs because the pg_logs tak
20 matches
Mail list logo