[ceph-users] Re: [Ceph incident] PG stuck in peering.

2024-09-26 Thread Frank Schilder
Hi Loan, thanks for the detailed post-mortem to the list! I misread your first message, unfortunately. On our cluster we also had issues with 1-2 PGs being stuck in peering resulting in blocked IO and warnings piling up. We identified the "bad" OSD by shutting one member-OSD down at a time and

[ceph-users] Re: [EXTERNAL] Re: Backup strategies for rgw s3

2024-09-26 Thread Alex Hussein-Kershaw (HE/HIM)
We have been using Amazon S3 (rclone.org) to copy all the data to a filesystem nightly to provide an S3 backup mechanism. It has Ceph support out the box (added by one of my colleagues a few years ago). From: Adam Prycki Sent: Wednes

[ceph-users] Re: cephfs +inotify = caps problem?

2024-09-26 Thread Frédéric Nass
Hi Burkhard, This is a known issue. We ran into it a few months back using VScode containers working on Cephfs under Kubernetes. Tweaking the settings.json file as he suggested here [1] by Dietmar did the trick for us. Regards, Frédéric. [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ce

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Eugen Block
Hi, this seems a bit unnecessary to rebuild OSDs just to get them managed. If you apply a spec file that targets your hosts/OSDs, they will appear as managed. So when you would need to replace a drive, you could already utilize the orchestrator to remove and zap the drive. That works just

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Eugen Block
Right, if you need encryption, a rebuild is required. Your procedure has already worked 4 times, so I'd say nothing seems wrong with that per se. Regarding the stuck device list, do you see the mgr logging anything suspicious? Especially when you say that it only returns output after a fail

[ceph-users] v19.2.0 Squid released

2024-09-26 Thread Laura Flores
We're very happy to announce the first stable release of the Squid series. We express our gratitude to all members of the Ceph community who contributed by proposing pull requests, testing this release, providing feedback, and offering valuable suggestions. Highlights: RADOS * BlueStore has been

[ceph-users] ceph can list volumes from a pool but can not remove the volume

2024-09-26 Thread bryansoong21
We have a volume in our cluster: [r...@ceph-1.lab-a ~]# rbd ls volume-ssd volume-8a30615b-1c91-4e44-8482-3c7d15026c28 [r...@ceph-1.lab-a ~]# rbd rm volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28 Removing image: 0% complete...failed. rbd: error opening image volume-8a30615b-1c91-4e44-8482

[ceph-users] Re: ceph can list volumes from a pool but can not remove the volume

2024-09-26 Thread Anthony D'Atri
https://docs.ceph.com/en/reef/rbd/rbd-snapshot/ should give you everything you need. Sounds like maybe you have snapshots / clones that have left the parent lingering as a tombstone? Start with rbd children volume-ssd/volume-8a30615b-1c91-4e44-8482-3c7d15026c28 rbd info volume-

[ceph-users] Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas
Hello everyone, my cluster has two CRUSH rules: the default replicated_rule (rule_id 0), and another rule named rack-aware (rule_id 1). Now, if I'm not misreading the config reference, I should be able to define that all future-created pools use the rack-aware rule, by setting osd_pool_defau

[ceph-users] Re: Ceph Dashboard TLS

2024-09-26 Thread matthew
Yeap, that was my issue (forgot to open up port 8443 in the firewall) Thanks for the help PS Oh, and you *can* use ECC TLS Certs - if anyone wanted to know. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Cephalocon 2024 Developer Summit & New Users Workshop!

2024-09-26 Thread Neha Ojha
Dear Ceph Community, We are happy to announce two key events, the Ceph Developer Summit and the Ceph New Users Workshop (limited capacity), the day before Cephalocon 2024. More details and registration information are now live on our Ceph website

[ceph-users] Re: Mds daemon damaged - assert failed

2024-09-26 Thread Eugen Block
It could be a bug, sure, but I haven't searched tracker too long, maybe there is an existing bug, I'd leave it to the devs to comment on that. But the assert alone isn't of much help (to me), more mds logs could help track this down. Zitat von "Kyriazis, George" : On Sep 25, 2024, at 1:05

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Eugen Block
Hm, I don't know much about ceph-ansible. Did you check if there was any config set for a specific daemon or something, which would override the config set? For example, 'ceph config show-with-defaults mon.' for each mon, and then also check 'ceph config dump | grep rule'. I would also prob

[ceph-users] Re: Ceph orchestrator not refreshing device list

2024-09-26 Thread Bob Gibson
Thanks for your reply Eugen. I’m fairly new to cephadm so I wasn’t aware that we could manage the drives without rebuilding them. However, we thought we’d take advantage of this opportunity to also encrypt the drives, and that does require a rebuild. I have a theory on why the orchestrator is c

[ceph-users] Re: CephFS snaptrim bug?

2024-09-26 Thread Linkriver Technology
Hello, We recently upgraded to Quincy (17.2.7) and I can see in the ceph logs many messages of the form: 1713256584.3135679 osd.28 (osd.28) 66398 : cluster 4 osd.28 found snap mapper error on pg 7.284 oid 7:214b503b:::100125de9b8.:5c snaps in mapper: {}, oi: {5a} ...repaired 1713256584.31

[ceph-users] Re: Quincy: osd_pool_default_crush_rule being ignored?

2024-09-26 Thread Florian Haas
On 25/09/2024 09:05, Eugen Block wrote: Hi, for me this worked in a 17.2.7 cluster just fine Huh, interesting! (except for erasure-coded pools). Okay, *that* bit is expected. https://docs.ceph.com/en/quincy/rados/configuration/pool-pg-config-ref/#confval-osd_pool_default_crush_rule does

[ceph-users] cephadm bootstrap ignoring --skip-firewalld

2024-09-26 Thread Kozakis, Anestis
As I mentioned in my earlier e-mail, new to Ceph, and trying to set up automation to deploy, configure, and manage a Ceph cluster. We configure our Firewall rules through SaltStack. I am passing the -skip-firewalld option to the cephadm bootstrap command, but cephadm seems to ignore the option