Hello,
Our cluster has become unresponsive after a teammate's work on the cluster.
We are currently unable to get the full story on what he did to fully
understand what is going on, and the only error we are able to see in any
of the logs is the following:
2024-08-20T03:12:34.183+ 7f3670246b80
This looks the same with https://tracker.ceph.com/issues/52280, which
has already been fixed. I just checked your ceph version, which has
already included the fixing.
So this should be a new case.
BTW, how did this happen ?Were you doing the failover or something else ?
Thanks
- Xiubo
On 8
The gibba upgrade is complete! All daemons upgraded successfully.
On Tue, Aug 20, 2024 at 10:44 AM Venky Shankar wrote:
> On Mon, Aug 19, 2024 at 4:46 PM Venky Shankar wrote:
> >
> > Hi Brad,
> >
> > On Fri, Aug 16, 2024 at 8:59 AM Brad Hubbard
> wrote:
> > >
> > > On Thu, Aug 15, 2024 at 11:5
HI,
What pops out is the " handle_client_mkdir()"... Does this mean the MDS
crashed when a client was creating a new dir or snapshot? Any idea about
the steps?
Thank you,
Bogdan Velica
croit.io
On Tue, Aug 20, 2024 at 7:47 PM Tarrago, Eli (RIS-BCT) <
eli.tarr...@lexisnexisrisk.com> wrote:
> Her
Here is the backtrace from a ceph crash
ceph crash info
'2024-08-20T16:07:39.319197Z_8bcdf3df-f9b5-451a-b971-16f8190ab351'
{
"assert_condition": "!p",
"assert_file": "/build/ceph-18.2.4/src/mds/MDCache.cc",
"assert_func": "void MDCache::add_inode(CInode*)",
"assert_line": 251,
On Mon, Aug 19, 2024 at 4:46 PM Venky Shankar wrote:
>
> Hi Brad,
>
> On Fri, Aug 16, 2024 at 8:59 AM Brad Hubbard wrote:
> >
> > On Thu, Aug 15, 2024 at 11:50 AM Brad Hubbard wrote:
> > >
> > > On Tue, Aug 6, 2024 at 6:33 AM Yuri Weinstein wrote:
> > > >
> > > > Details of this release are sum
Good Morning Ceph Users,
I’m currently engaged in troubleshooting an issue and I wanted to post here to
get some feedback. If there is no response or feedback that this looks like a
bug, then I’ll write up a bug report.
Cluster:
Reef 18.2.4
Ubuntu 20.04
ceph -s
cluster:
id: 93e49b2e-
Hi,
can you share more details? For example the auth caps of your fuse
client (ceph auth export client.) and the exact command
that fails? Did it work before?
I just did that on a small test cluster (17.2.7) without an issue.
BTW, the warning "too many PGs per OSD (328 > max 250)" is serio
Hi (please don't drop the ML from your responses),
All PGs of pool cephfs are affected and they are in all OSDs
then just pick a random one and check if anything stands out. I'm not
sure if you mentioned it already, did you also try restarting OSDs?
Oh, not yesterday. I do it now, then I
Hi Yuri,
ceph-volume approved https://jenkins.ceph.com/job/ceph-volume-test/601/
Regards,
--
Guillaume Abrioux
Software Engineer
From: Yuri Weinstein
Date: Monday, 5 August 2024 at 22:33
To: dev , ceph-users
Subject: [EXTERNAL] [ceph-users] squid 19.1.1 RC QE validation status
Details of this
ps,
hier the command
[rook@rook-ceph-tools-5459f7cb5b-p55np /]$ ceph pg dump | grep snaptrim
| grep -v 'snaptrim_wait' | awk '{print $18}' | sort | uniq
dumped all
0
10
11
2
8
9
Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud:
I had this problem once in the past and found that it was related to
Hello Arnaud,
I have all 6 OSDs in the List :-(.
Thanks, for Idea, maybe could help other users
Regards,
Giovanna
Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud:
ceph pg dump | grep snaptrim | grep -v ‘snaptrim_wait’
--
Giovanna Ratini
Mail:rat...@dbvis.inf.uni-konstanz.de
Phone: +49 (0) 7
I had this problem once in the past and found that it was related to a
particular osd. To identify it, I ran the command “ceph pg dump | grep snaptrim
| grep -v ‘snaptrim_wait’” and found that the osd displayed in the “UP_PRIMARY”
column was almost always the same.
So I restarted this osd and t
Don't worry, it happens a lot and it also happens to me. ;-) Glad it
worked for you as well.
Zitat von Benjamin Huth :
Thank you so much for the help! Thanks to the issue you linked and the
other guy you replied to with the same issue, I was able to edit the
config-key and get my orchestrator
Did you reduce the default values I mentioned? You could also look
into the historic_ops of the primary OSD for one affected PG:
ceph tell osd. dump_historic_ops_by_duration
But I'm not sure if that can actually help here. There are plenty of
places to look at, you could turn on debug logs o
15 matches
Mail list logo