[ceph-users] Re: ceph tell setting ignored?

2022-10-05 Thread Wout van Heeswijk
Hi Nicola, Maybe 'config diff' can be of use to you ceph tell osd.2 config diff It should tell you every value that is not 'default' and where the value(s) came from (File, mon, override). Wout -Oorspronkelijk bericht- Van: Nicola Mori Verzonden: Wednesday, 5 October 2022 12:33 Aan:

[ceph-users] ms_dispatcher of ceph-mgr 100% cpu on pacific 16.2.7

2022-09-15 Thread Wout van Heeswijk
Hi Everyone, We have a cluster that of which the manager is not working nicely. The mgrs are all very slow to respond. This initially caused them to continuously fail over. We've disabled most of the modules. We’ve set the following which seemed to improve the situation a little bit but the p

[ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

2021-02-26 Thread Wout van Heeswijk
The issue is found and fixed in 15.2.3. Thanks for your response Igor! Kind regards, Wout 42on From: Wout van Heeswijk Sent: Friday, 26 February 2021 16:10 To: ceph-users@ceph.io Subject: [ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

[ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

2021-02-26 Thread Wout van Heeswijk
For those interested in this issue. We've been seeing OSDs with corrupted wals after they had a suicide time out. I've updated the ticket created by William with some of our logs. https://tracker.ceph.com/issues/48827#note-16 We're using ceph 15.2.2 in this cluster. Currently we are contemplatin

[ceph-users] Unable to cancel buckets from resharding queue

2021-01-12 Thread Wout van Heeswijk
We are experiencing some issues with the bucket resharding queue in Ceph Mimic at one of our customers. I suspect that some of the issues are related to the upgrades of earlier versions of the cluster/radosgw. 1) When we cancel the resharding of a bucket, the bucket resharding entry is removed

[ceph-users] Re: PGs down

2020-12-15 Thread Wout van Heeswijk
Hi Igor, Are you referring to the bug reports: - https://tracker.ceph.com/issues/48276 | OSD Crash with ceph_assert(is_valid_io(off, len)) - https://tracker.ceph.com/issues/46800 | Octopus OSD died and fails to start with FAILED ceph_assert(is_valid_io(off, len)) If that is the case, do you th

[ceph-users] Re: high latency after maintenance]

2020-11-06 Thread Wout van Heeswijk
Hi Marcel, The peering process is the process used by Ceph OSDs, on a per placement group basis, to agree on the state of that placement on each of the involved OSDs. In your case, 2/3 of the placement group metadata that needs to be agreed upon/checked is on the nodes that did not undergo main

[ceph-users] Re: Troubleshooting stuck unclean PGs?

2020-09-22 Thread Wout van Heeswijk
Hi Matt, Looks like you are on the right track. Kind regards, Wout 42on From: Matt Larson Sent: Tuesday, September 22, 2020 5:44 AM To: Wout van Heeswijk Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Troubleshooting stuck unclean PGs? I tried this

[ceph-users] Re: Troubleshooting stuck unclean PGs?

2020-09-21 Thread Wout van Heeswijk
Hi Matt, The mon data can grow during when PGs are stuck unclean. Don't restart the mons. You need to find out why your placement groups are "backfill_wait". Likely some of your OSDs are (near)full. If you have space elsewhere you can use the ceph balancer module or reweighting of OSDs to reba

[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?

2020-09-21 Thread Wout van Heeswijk
Hi Rene, Yes, cephfs is a good filesystem for concurrent writing. When using CephFS with ganesha you can even scale out. It will perform better but why don't you mount CephFS inside the VM? Kind regards, Wout 42on From: René Bartsch Sent: Monday, Sept

[ceph-users] Re: What is the advice, one disk per OSD, or multiple disks

2020-09-21 Thread Wout van Heeswijk
Just to expand on the answer of Robert. If all devices are of the same class (hdd/ssd/nvme) then a one on one relationship is most likely the best choice. If you have very fast devices it might be good the have multiple OSDs on one devices, at the cost of some complexity. If you have devices o

[ceph-users] Re: Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

2020-07-06 Thread Wout van Heeswijk
te-5 kind regards, Wout 42on On 2020-07-06 14:20, Wout van Heeswijk wrote: An update about the progression of this issue: After a few hours of normal operation the problem is now back in full swing. About ten, this time different osd's, have started crashing on segfaults again. kind regar

[ceph-users] Re: Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

2020-07-06 Thread Wout van Heeswijk
An update about the progression of this issue: After a few hours of normal operation the problem is now back in full swing. About ten, this time different osd's, have started crashing on segfaults again. kind regards, Wout 42on On 2020-07-06 09:23, Wout van Heeswijk wrote: Hi Dan,

[ceph-users] Re: Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

2020-07-06 Thread Wout van Heeswijk
oads of memory. (see ceph config set osd bluestore_fsck_quick_fix_on_mount false in the release notes). -- Dan On Sun, Jul 5, 2020 at 2:43 PM Wout van Heeswijk wrote: Hi All, A customer of ours has upgraded the cluster from nautilus to octopus after experiencing issues with osds not being able to connect to each ot

[ceph-users] Re: Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

2020-07-05 Thread Wout van Heeswijk
t; /var/log/kern.log root@st0:~# kind regards, Wout 42on On 2020-07-05 14:45, Lindsay Mathieson wrote: On 5/07/2020 10:43 pm, Wout van Heeswijk wrote: After unsetting the norecover and nobackfill flag some OSDs started crashing every few minutes. The OSD log, even with high debug settings,

[ceph-users] Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

2020-07-05 Thread Wout van Heeswijk
Hi All, A customer of ours has upgraded the cluster from nautilus to octopus after experiencing issues with osds not being able to connect to each other, clients/mons/mgrs. The connectivity issues was related to the msgrV2 and require_osd_release setting not being set to nautilus. After fixin

[ceph-users] Nautilus 14.2.10 mon_warn_on_pool_no_redundancy

2020-06-29 Thread Wout van Heeswijk
Hi All, I really like the idea of warning users against using unsafe practices. Wouldn't it make sense to warn against using min_size=1 instead of size=1. I've seen data loss happen with size=2 min_size=1 when multiple failures occur and write have been done between the failures. Effectively t

[ceph-users] Re: Removing pool in nautilus is incredibly slow

2020-06-25 Thread Wout van Heeswijk
Hi Francois, Have you already looked at the option "osd_delete_sleep"? It will not speed up the process but I will give you some control over your cluster performance. Something like: ceph tell osd.\* injectargs '--osd_delete_sleep1' kind regards, Wout 42on On 25-06-2020 09:57, Francois L

[ceph-users] Re: How to force backfill on undersized pgs ?

2020-06-18 Thread Wout van Heeswijk
Hi Kári, The backfilling process will prioritize those backfill request that are for degrade pgs or undersized pgs: " The next priority is backfill of degraded PGs and is a function of the degradation. A backfill for a PG missing two replicas will have a priority higher than a backfill

[ceph-users] Re: OSDs taking too much memory, for pglog

2020-05-14 Thread Wout van Heeswijk
Hi Harald, Your cluster has a lot of objects per osd/pg and the pg logs will grow fast and large because of this. The pg_logs will keep growing as long as you're clusters pgs are not active+clean. This means you are now in a loop where you cannot get stable running OSDs because the pg_logs tak