[ceph-users] Unable to recover cluster, error: unable to read magic from mon data

2024-08-20 Thread RIT Computer Science House
Hello, Our cluster has become unresponsive after a teammate's work on the cluster. We are currently unable to get the full story on what he did to fully understand what is going on, and the only error we are able to see in any of the logs is the following: 2024-08-20T03:12:34.183+ 7f3670246b80

[ceph-users] Re: Cephfs mds node already exists crashes mds

2024-08-20 Thread Xiubo Li
This looks the same with https://tracker.ceph.com/issues/52280, which has already been fixed. I just checked your ceph version, which has already included the fixing. So this should be a new case. BTW, how did this happen ?Were you doing the failover or something else ? Thanks - Xiubo On 8

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-20 Thread Laura Flores
The gibba upgrade is complete! All daemons upgraded successfully. On Tue, Aug 20, 2024 at 10:44 AM Venky Shankar wrote: > On Mon, Aug 19, 2024 at 4:46 PM Venky Shankar wrote: > > > > Hi Brad, > > > > On Fri, Aug 16, 2024 at 8:59 AM Brad Hubbard > wrote: > > > > > > On Thu, Aug 15, 2024 at 11:5

[ceph-users] Re: Cephfs mds node already exists crashes mds

2024-08-20 Thread Bogdan Adrian Velica
HI, What pops out is the " handle_client_mkdir()"... Does this mean the MDS crashed when a client was creating a new dir or snapshot? Any idea about the steps? Thank you, Bogdan Velica croit.io On Tue, Aug 20, 2024 at 7:47 PM Tarrago, Eli (RIS-BCT) < eli.tarr...@lexisnexisrisk.com> wrote: > Her

[ceph-users] Re: Cephfs mds node already exists crashes mds

2024-08-20 Thread Tarrago, Eli (RIS-BCT)
Here is the backtrace from a ceph crash ceph crash info '2024-08-20T16:07:39.319197Z_8bcdf3df-f9b5-451a-b971-16f8190ab351' { "assert_condition": "!p", "assert_file": "/build/ceph-18.2.4/src/mds/MDCache.cc", "assert_func": "void MDCache::add_inode(CInode*)", "assert_line": 251,

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-20 Thread Venky Shankar
On Mon, Aug 19, 2024 at 4:46 PM Venky Shankar wrote: > > Hi Brad, > > On Fri, Aug 16, 2024 at 8:59 AM Brad Hubbard wrote: > > > > On Thu, Aug 15, 2024 at 11:50 AM Brad Hubbard wrote: > > > > > > On Tue, Aug 6, 2024 at 6:33 AM Yuri Weinstein wrote: > > > > > > > > Details of this release are sum

[ceph-users] Cephfs mds node already exists crashes mds

2024-08-20 Thread Tarrago, Eli (RIS-BCT)
Good Morning Ceph Users, I’m currently engaged in troubleshooting an issue and I wanted to post here to get some feedback. If there is no response or feedback that this looks like a bug, then I’ll write up a bug report. Cluster: Reef 18.2.4 Ubuntu 20.04 ceph -s cluster: id: 93e49b2e-

[ceph-users] Re: CephFS troubleshooting

2024-08-20 Thread Eugen Block
Hi, can you share more details? For example the auth caps of your fuse client (ceph auth export client.) and the exact command that fails? Did it work before? I just did that on a small test cluster (17.2.7) without an issue. BTW, the warning "too many PGs per OSD (328 > max 250)" is serio

[ceph-users] Re: The snaptrim queue of PGs has not decreased for several days.

2024-08-20 Thread Eugen Block
Hi (please don't drop the ML from your responses), All PGs of pool cephfs are affected and they are in all OSDs then just pick a random one and check if anything stands out. I'm not sure if you mentioned it already, did you also try restarting OSDs? Oh, not yesterday. I do it now, then I

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-20 Thread Guillaume ABRIOUX
Hi Yuri, ceph-volume approved https://jenkins.ceph.com/job/ceph-volume-test/601/ Regards, -- Guillaume Abrioux Software Engineer From: Yuri Weinstein Date: Monday, 5 August 2024 at 22:33 To: dev , ceph-users Subject: [EXTERNAL] [ceph-users] squid 19.1.1 RC QE validation status Details of this

[ceph-users] Re: The snaptrim queue of PGs has not decreased for several days.

2024-08-20 Thread Giovanna Ratini
ps, hier the command [rook@rook-ceph-tools-5459f7cb5b-p55np /]$ ceph pg dump | grep snaptrim | grep -v 'snaptrim_wait' | awk '{print $18}' | sort | uniq dumped all 0 10 11 2 8 9 Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud: I had this problem once in the past and found that it was related to

[ceph-users] Re: The snaptrim queue of PGs has not decreased for several days.

2024-08-20 Thread Giovanna Ratini
Hello Arnaud, I have all 6 OSDs in the List :-(. Thanks, for Idea, maybe could help other users Regards, Giovanna Am 20.08.2024 um 10:25 schrieb MARTEL Arnaud: ceph pg dump | grep snaptrim | grep -v ‘snaptrim_wait’ -- Giovanna Ratini Mail:rat...@dbvis.inf.uni-konstanz.de Phone: +49 (0) 7

[ceph-users] Re: The snaptrim queue of PGs has not decreased for several days.

2024-08-20 Thread MARTEL Arnaud
I had this problem once in the past and found that it was related to a particular osd. To identify it, I ran the command “ceph pg dump | grep snaptrim | grep -v ‘snaptrim_wait’” and found that the osd displayed in the “UP_PRIMARY” column was almost always the same. So I restarted this osd and t

[ceph-users] Re: Bug with Cephadm module osd service preventing orchestrator start

2024-08-20 Thread Eugen Block
Don't worry, it happens a lot and it also happens to me. ;-) Glad it worked for you as well. Zitat von Benjamin Huth : Thank you so much for the help! Thanks to the issue you linked and the other guy you replied to with the same issue, I was able to edit the config-key and get my orchestrator

[ceph-users] Re: The snaptrim queue of PGs has not decreased for several days.

2024-08-20 Thread Eugen Block
Did you reduce the default values I mentioned? You could also look into the historic_ops of the primary OSD for one affected PG: ceph tell osd. dump_historic_ops_by_duration But I'm not sure if that can actually help here. There are plenty of places to look at, you could turn on debug logs o