[ceph-users] Re: Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-26 Thread Frank Schilder
Hi David, we recently had the same/similar problem, a failing SFP transceiver. We got "long ping time" warnings and it took a while to find the source. Strange that you didn't have ping time warnings. Are your thresholds too high? I learned that our switches have flapping protection, it is call

[ceph-users] Re: MDSs report damaged metadata

2021-02-26 Thread ricardo.re.azevedo
Thanks for the advice and info regarding the error. I tried ` ceph tell mds.database-0 scrub start / recursive repair force` and it didn't help. Is there anything else I can try? Or manually fix the links? Best, Ricardo -Original Message- From: Patrick Donnelly Sent: Thursday, Febru

[ceph-users] Re: Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-26 Thread David Orman
Hi Martin, We've already got the collection in place, and we (in retrospect) see some errors on the sub-interface in question. We'll be adding alerting for this specific scenario as it was missed in more general alerting, and the bonded interfaces themselves don't show the errors - only the underl

[ceph-users] Re: Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-26 Thread Martin Verges
Hello, within croit, we have a network latency monitoring that would have shown you the packetlos. We therefore suggest to install something like a smokeping on your infrastructure to monitor the quality of your network. Why does it affect your cluster? The network is the central component of a

[ceph-users] Re: Slow cluster / misplaced objects - Ceph 15.2.9

2021-02-26 Thread David Orman
We figured this out - it was a leg of an LACP-based interface that was misbehaving. Once we dropped it, everything went back to normal. Does anybody know a good way to get a sense of what might be slowing down a cluster in this regard, with EC? We didn't see any indication of a single host as a pro

[ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

2021-02-26 Thread Wout van Heeswijk
The issue is found and fixed in 15.2.3. Thanks for your response Igor! Kind regards, Wout 42on From: Wout van Heeswijk Sent: Friday, 26 February 2021 16:10 To: ceph-users@ceph.io Subject: [ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

[ceph-users] Re: Nautilus Cluster Struggling to Come Back Online

2021-02-26 Thread Wout van Heeswijk
For those interested in this issue. We've been seeing OSDs with corrupted wals after they had a suicide time out. I've updated the ticket created by William with some of our logs. https://tracker.ceph.com/issues/48827#note-16 We're using ceph 15.2.2 in this cluster. Currently we are contemplatin

[ceph-users] "optimal" tunables on release upgrade

2021-02-26 Thread Matthew Vernon
Hi, Having been slightly caught out by tunables on my Octopus upgrade[0], can I just check that if I do ceph osd crush tunables optimal That will update the tunables on the cluster to the current "optimal" values (and move a lot of data around), but that this doesn't mean they'll change next

[ceph-users] Re: ceph version of new daemons deployed with orchestrator

2021-02-26 Thread Tobias Fischer
Hi Kenneth, check the config db which image is set: ceph config dump WHO MASK LEVEL OPTIONVALUE RO globalbasic container_image docker.io/ceph/ceph:v15.2.9 * Probably you hav

[ceph-users] Re: MON slow ops and growing MON store

2021-02-26 Thread Janek Bevendorff
Since the full cluster restart and disabling logging to syslog, it's not a problem any more (for now). Unfortunately, just disabling clog_to_monitors didn't have the wanted effect when I tried it yesterday. But I also believe that it is somehow related. I could not find any specific reason for

[ceph-users] Re: MON slow ops and growing MON store

2021-02-26 Thread Mykola Golub
On Thu, Feb 25, 2021 at 08:58:01PM +0100, Janek Bevendorff wrote: > On the first MON, the command doesn’t even return, but I was able to > get a dump from the one I restarted most recently. The oldest ops > look like this: > > { > "description": "log(1000 entries from seq 17876

[ceph-users] Re: ceph version of new daemons deployed with orchestrator

2021-02-26 Thread Kenneth Waegeman
Hi Tobi, I didn't know about that config option, but that did the trick! Thank you! Kenneth On 26/02/2021 11:30, Tobias Fischer wrote: Hi Kenneth, check the config db which image is set: ceph config dump WHO MASK LEVEL OPTIONVALUE

[ceph-users] ceph version of new daemons deployed with orchestrator

2021-02-26 Thread Kenneth Waegeman
Hi all, I am running a cluster managed by orchestrator/cephadm. I installed new host for OSDS yesterday, the osd daemons were automatically created using drivegroups service specs (https://docs.ceph.com/en/latest/cephadm/drivegroups/#drivegroups