Cc: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from
nautilus to octopus
Just a datapoint - we upgraded several large Mimic-born clusters straight to
15.2.12 with the quick fsck disabled in ceph.conf, then did
require-osd-release, and fi
Just a datapoint - we upgraded several large Mimic-born clusters straight
to 15.2.12 with the quick fsck disabled in ceph.conf, then did
require-osd-release, and finally did the omap conversion offline after the
cluster was upgraded using the bluestore tool while the OSDs were down (all
done in bat
Hi Frank,
Thank you very much for this! :)
>
> we just completed a third upgrade test. There are 2 ways to convert the
> OSDs:
>
> A) convert along with the upgrade (quick-fix-on-start=true)
> B) convert after setting require-osd-release=octopus (quick-fix-on-
> start=false until require-osd-re
Hi,
i just checked and all OSDs have it set to true.
It seems also not a problem with the snaptrim opration.
We just had two times in the last 7 days where nearly all OSDs logged a lot
(around 3k times in 20 minutes) of these messages:
022-09-12T20:27:19.146+0200 7f576de49700 -1 osd.9 786378 get_h
I haven't read through this entire thread so forgive me if already
mentioned:
What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw
a terrible slowdown on our OSDs during snaptrim events and setting
bluefs_buffered_io to true alleviated that issue. That was on a nautilus
clus
The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
this should be done fairly fast.
For now I will recreate every OSD in the cluster and check if this helps.
Do you experience slow OPS (so the cluster shows a message like "cluster
[WRN] Health check update: 679 slow ops, oldes
>
> It might be possible that converting OSDs before setting require-osd-
> release=octopus leads to a broken state of the converted OSDs. I could
> not yet find a way out of this situation. We will soon perform a third
> upgrade test to test this hypothesis.
>
So with upgrading one should put
Hi Frank,
we converted the OSDs directly on the upgrade.
1. installing new ceph versions
2. restart all OSD daemons
3. wait some time (took around 5-20 minutes)
4. all OSDs were online again.
So I would expect, that the OSDs are all upgraded correctly.
I also checked when the trimming happens, an
I checked the cluster for other snaptrim operations and they happen all
over the place, so for me it looks like they just happend to be done when
the issue occured, but were not the driving factor.
Am Di., 13. Sept. 2022 um 12:04 Uhr schrieb Boris Behrens :
> Because someone mentioned that the at
Because someone mentioned that the attachments did not went through I
created pastebin links:
monlog: https://pastebin.com/jiNPUrtL
osdlog: https://pastebin.com/dxqXgqDz
Am Di., 13. Sept. 2022 um 11:43 Uhr schrieb Boris Behrens :
> Hi, I need you help really bad.
>
> we are currently experiencin
Hi - I've been bitten by that too and checked, and that *did* happen but I
swapped them off a while ago.
Thanks for your quick reply :)
-Alex
On Mar 29, 2022, 6:26 PM -0400, Arnaud M , wrote:
> Hello
>
> is swap enabled on your host ? Is swap used ?
>
> For our cluster we tend to allocate enough
Hello
is swap enabled on your host ? Is swap used ?
For our cluster we tend to allocate enough ram and disable swap
Maybe the reboot of your host re-activated swap ?
Try to disable swap and see if it help
All the best
Arnaud
Le mar. 29 mars 2022 à 23:41, David Orman a écrit :
> We're defin
We're definitely dealing with something that sounds similar, but hard to
state definitively without more detail. Do you have object lock/versioned
buckets in use (especially if one started being used around the time of the
slowdown)? Was this cluster always 16.2.7?
What is your pool configuration
13 matches
Mail list logo