[ceph-users] Ceph Orchestrator (cephadm) stopped doing something

2022-11-28 Thread Volker Racho
Hi, ceph orch commands are not executed anymore in my cephadm-managed cluster (17.2.3) and I don't see why. Cluster is healthy and overall working, except for the orchestrator part. For instance, when I run `ceph orch redeploy ingress.rgw.default`, I see the command in audit logs, cephadm also lo

[ceph-users] Re: PGs stuck down

2022-11-28 Thread Yanko Davila
Hi Dale Can you please post the ceph status ? I’m no expert but I would make sure that the datacenter you intend to operate (while the connection gets reestablished) has two active monitors. Thanks. Yanko. > On Nov 29, 2022, at 7:20 AM, Wolfpaw - Dale Corse wrote: > > Hi All, > > > > We

[ceph-users] PGs stuck down

2022-11-28 Thread Wolfpaw - Dale Corse
Hi All, We had a fiber cut tonight between 2 data centers, and a ceph cluster didn't do very well :( We ended up with 98% of PGs as down. This setup has 2 data centers defined, with 4 copies across both, and a minimum of size of 1. We have 1 mon/mgr in each DC, with one in a 3rd data cente

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Venky Shankar
Hi Frank, On Tue, Nov 29, 2022 at 12:32 AM Frank Schilder wrote: > > Hi Reed, > > I sometimes stuck had MDS ops as well, making the journal trim stop and the > meta data pool running full slowly. Its usually a race condition in the MDS > ops queue and re-scheduling the OPS in the MDS queue reso

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Venky Shankar
Hi Reed, On Tue, Nov 29, 2022 at 3:13 AM Reed Dier wrote: > > So, ironically, I did try and take some of these approaches here. > > I first moved the nearfull goalpost to see if that made a difference, it did > for client writes, but not for the metadata to unstick. > > I did some hunting for so

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-28 Thread Xiubo Li
On 28/11/2022 23:21, Adrien Georget wrote: Hi Xiubo, I did a journal reset today followed by session reset and then the MDS was able to start without switching to readonly mode. A MDS scrub was also usefull to repair some bad inode backtrace. Thanks again for your help with this issue! Coo

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Frank Schilder
Hi Reed, forget what I wrote about pinning, you use only 1 MDS, so it won't change anything. I think the problem you are facing is with the standby-replay daemon mode. I used that in the past too, but found out that it actually didn't help with fail-over speed to begin with. On top of that, the

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Reed Dier
So, ironically, I did try and take some of these approaches here. I first moved the nearfull goalpost to see if that made a difference, it did for client writes, but not for the metadata to unstick. I did some hunting for some hung/waiting processes on some of the client nodes, and was able to

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Frank Schilder
Hi Reed, I sometimes stuck had MDS ops as well, making the journal trim stop and the meta data pool running full slowly. Its usually a race condition in the MDS ops queue and re-scheduling the OPS in the MDS queue resolves it. To achieve that, I usually try in escalating order: - Find the clie

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Reed Dier
Hi Venky, Thanks for responding. > A good chunk of those are waiting for the directory to finish > fragmentation (split). I think those ops are not progressing since > fragmentation involves creating more objects in the metadata pool. > Update ops will involve appending to the mds journal consum

[ceph-users] Re: MDS stuck ops

2022-11-28 Thread Venky Shankar
On Mon, Nov 28, 2022 at 10:19 PM Reed Dier wrote: > > Hopefully someone will be able to point me in the right direction here: > > Cluster is Octopus/15.2.17 on Ubuntu 20.04. > All are kernel cephfs clients, either 5.4.0-131-generic or 5.15.0-52-generic. > Cluster is nearful, and more storage is co

[ceph-users] MDS stuck ops

2022-11-28 Thread Reed Dier
Hopefully someone will be able to point me in the right direction here: Cluster is Octopus/15.2.17 on Ubuntu 20.04. All are kernel cephfs clients, either 5.4.0-131-generic or 5.15.0-52-generic. Cluster is nearful, and more storage is coming, but still 2-4 weeks out from delivery. > HEALTH_WARN 1

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-28 Thread Adrien Georget
Hi Xiubo, I did a journal reset today followed by session reset and then the MDS was able to start without switching to readonly mode. A MDS scrub was also usefull to repair some bad inode backtrace. Thanks again for your help with this issue! Cheers, Adrien Le 26/11/2022 à 05:08, Xiubo Li a

[ceph-users] Re: Ceph networking

2022-11-28 Thread Anthony D'Atri
I’ve never done it myself, but the network config options for public/private should take a comma-separated list of CIDR blocks. The client/public should be fine. For the backend/private/replication network, that is likely overkill. Are your OSDs SSDs or HDDs? If you do go this route, be sure

[ceph-users] Re: CephFS Snapshot Mirroring slow due to repeating attribute sync

2022-11-28 Thread Venky Shankar
Hi Mathias, (apologies for the super late reply - I was getting back from a long vacation and missed seeing this). I updated the tracker ticket. Let's move the discussion there... On Mon, Nov 28, 2022 at 7:46 PM Venky Shankar wrote: > > On Tue, Aug 23, 2022 at 10:01 PM Kuhring, Mathias > wrote

[ceph-users] Re: Ceph networking

2022-11-28 Thread Stephen Smith6
The “Network Configuration Reference” is always a good place to start: https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/ Multiple client networks are possible ( see the “public_network” configuration option ) I believe you’d configure 2 “public_network”s: 1. For actual

[ceph-users] Re: CephFS Snapshot Mirroring slow due to repeating attribute sync

2022-11-28 Thread Venky Shankar
On Tue, Aug 23, 2022 at 10:01 PM Kuhring, Mathias wrote: > > Dear Ceph developers and users, > > We are using ceph version 17.2.1 > (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable). > We are using cephadm since version 15 octopus. > > We mirror several CephFS directories from our main cl

[ceph-users] Ceph networking

2022-11-28 Thread Jan Marek
Hello, I have a CEPH cluster with 3 MONs and 6 OSD nodes with 72 OSDs. I would like to have multiple client and backed networks. I have now 2x 10Gbps and 2x25Gbps NIC in the nodes and my idea is to have: - 2 client network, for example 192.168.1.0/24 on 10Gbps NICs and 192.168.2.0/24 on 25Gbps N

[ceph-users] Re: ceph-volume lvm zap destroyes up+in OSD

2022-11-28 Thread Frank Schilder
Thanks, also for finding the related tracker issue! It looks like a fix has already been approved. Hope it shows up in the next release. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: 28 Novemb

[ceph-users] Re: ceph-volume lvm zap destroyes up+in OSD

2022-11-28 Thread Eugen Block
Hi, seems like this tracker issue [1] already covers your question. I'll update the issue and add a link to our thread. [1] https://tracker.ceph.com/issues/57767 Zitat von Frank Schilder : Hi Eugen, can you confirm that the silent corruption happens also on a collocated OSDc (everythin

[ceph-users] Re: What to expect on rejoining a host to cluster?

2022-11-28 Thread Eneko Lacunza
Hi Matt, Also, make sure that when rejoining host has correct time. I have seen clusters going down when rejoining hosts that were down for maintenance for various weeks and came in with datetime deltas of some months (no idea why that happened, I arrived with the firefighter team ;-) ) Chee