[ceph-users] Re: OSD_UNREACHABLE After Upgrade to 17.2.8 – Issue with Public Network Detection

2025-03-25 Thread Frédéric Nass
- Le 25 Mar 25, à 10:59, Илья Безруков rbe...@gmail.com a écrit : > -- > > Hello Janne, > > We only have a single network configured for our OSDs: > > ```sh > ceph config get osd public_network172.20.180.0/24 > > ceph config get osd cluster_network172.20.180.0

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Malte Stroem
Hi Eugen, try deploying the NFS service like this: https://docs.ceph.com/en/latest/mgr/nfs/ Some had only success deploying it via the dashboard. Best, Malte On 25.03.25 13:02, Eugen Block wrote: Hi, I'm re-evaluating NFS again, testing on a virtual cluster with 18.2.4. For now, I don't ne

[ceph-users] Re: OSD_UNREACHABLE After Upgrade to 17.2.8 – Issue with Public Network Detection

2025-03-25 Thread Илья Безруков
-- Hello Janne, We only have a single network configured for our OSDs: ```sh ceph config get osd public_network172.20.180.0/24 ceph config get osd cluster_network172.20.180.0/24 ``` However, in the output of ceph health detail, we see multiple networks being checked

[ceph-users] Re: OSD_UNREACHABLE After Upgrade to 17.2.8 – Issue with Public Network Detection

2025-03-25 Thread Janne Johansson
> > After upgrading our Ceph cluster from 17.2.7 to 17.2.8 using `cephadm`, all > > OSDs are reported as unreachable with the following error: > > > > HEALTH_ERR 32 osds(s) are not reachable > > [ERR] OSD_UNREACHABLE: 32 osds(s) are not reachable > > osd.0's public address is not in '172.20.180

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Eugen Block
Yeah, it seems to work without the "keepalive-only" flag, at least from a first test. So keepalive-only is not working properly, it seems? Should I create a tracker for that or am I misunderstanding its purpose? Zitat von Malte Stroem : Hi Eugen, try omitting --ingress-mode keepalive-on

[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

2025-03-25 Thread Frédéric Nass
Hi Danish, Can you specify the version of Ceph used and whether versioning is enabled on this bucket? There are 2 ways to clean up orphan entries in a bucket index that I'm aware of : - One (the preferable way) is to rely on radosgw-admin command to check and hopefully fix the issue, cleaning

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Eugen Block
Okay, so I don't see anything in the keepalive log about communicating between each other. The config files are almost identical, no difference in priority, but in unicast_peer. ceph03 has no entry at all for unicast_peer, ceph02 has only ceph03 in there while ceph01 has both of the others

[ceph-users] Re: reef 18.2.5 QE validation status

2025-03-25 Thread Brad Hubbard
On Tue, Mar 25, 2025 at 7:40 AM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/70563#note-1 > Release Notes - TBD > LRC upgrade - TBD > > Seeking approvals/reviews for: > > smoke - Laura approved? Approved, issues are https://tracker.cep

[ceph-users] Re: Experience with 100G Ceph in Proxmox

2025-03-25 Thread Konold, Martin
Am 2025-03-20 15:15, schrieb Chris Palmer: Hi, * Ceph cluster 19.2.1 with 3 nodes, 4 x SATA disks with shared NVMe DB/WAL, single 10g NICs * Promox 8.3.5 cluster with 2 nodes (separate nodes to Ceph), single 10g NICs , single 1g NICs for corosync * Test VM was using KRBD R3 pool on HDD

[ceph-users] Re: Question about cluster expansion

2025-03-25 Thread Alan Murrell
On Mon, 2025-03-24 at 15:35 -0700, Anthony D'Atri wrote: > So probably all small-block RBD? Correct. I am using RBD pools. > Since you’re calling them thin, I’m thinking that they’re probably > E3.S.  U.3 is the size of a conventional 2.5” SFF SSD or HDD. Hrm, my terminology is probably confusi

[ceph-users] Re: OSD failed: still recovering

2025-03-25 Thread Anthony D'Atri
> > OK, good to know about the 5% misplaced objects report 😊 > > I just checked 'ceph -s' and the misplaced objects is showing 1.948%, but I > suspect I will see this up to 5% or so later on 😊 If you see a place in the docs where it would help to note the balancer phenomenon and mistakenly th

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Malte Stroem
Hi Eugen, yes, for me it's kind of "test-setting" for small setups. Doc says: Setting --ingress-mode keepalive-only deploys a simplified ingress service that provides a virtual IP with the nfs server directly binding to that virtual IP and leaves out any sort of load balancing or traffic red

[ceph-users] Re: Question about cluster expansion

2025-03-25 Thread Anthony D'Atri
>> Since you’re calling them thin, I’m thinking that they’re probably >> E3.S. U.3 is the size of a conventional 2.5” SFF SSD or HDD. > > Hrm, my terminology is probably confusing. According to the specs of > the servers, they are U.3 slots. Ah. I forget sometimes that there are both 7mm a

[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

2025-03-25 Thread Danish Khan
Dear Frédéric, Unfortunately, I am still using *Octopus* version and these commands are showing unrecognized. Versioning is also not enabled on the bucket. I tried running : radosgw-admin bucket check --bucket= --fix which run for few minutes giving lot of output, which contained below lines fo

[ceph-users] Re: OSD failed: still recovering

2025-03-25 Thread Frédéric Nass
Hi Alan, - Le 25 Mar 25, à 16:47, Alan Murrell a...@t-net.ca a écrit : > OK, so just an update that the recovery did finally complete, and I am pretty > sure that the "inconsistent" PGs were PGs that the failed OSD were part of. > Running 'ceph pg repair' has them sorted out, along with the 6

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Adam King
> > I just tried it with 3 keepalive daemons and one nfs daemon, it > doesn't really work because all three hosts have the virtual IP > assigned, preventing my client from mounting. So this doesn't really > work as a workaround, it seems. That's a bit surprising. The keepalive daemons are meant t

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Eugen Block
Thanks, Adam. I just tried it with 3 keepalive daemons and one nfs daemon, it doesn't really work because all three hosts have the virtual IP assigned, preventing my client from mounting. So this doesn't really work as a workaround, it seems. I feel like the proper solution would be to inc

[ceph-users] Re: OSD failed: still recovering

2025-03-25 Thread Alan Murrell
OK, so just an update that the recovery did finally complete, and I am pretty sure that the "inconsistent" PGs were PGs that the failed OSD were part of. Running 'ceph pg repair' has them sorted out, along with the 600+ "scrub errors" I had. I was able to remove the OSD from the cluster, and a

[ceph-users] Re: ceph-ansible LARGE OMAP in RGW pool

2025-03-25 Thread Danish Khan
Hi Frédéric, Thank you for replying. I followed the steps mentioned in https://tracker.ceph.com/issues/62845 and was able to trim all the errors. Everything seemed to be working fine until the same error appeared again. I am still assuming the main culprit of this issue is one missing object an

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Adam King
Which daemons get moved around like that is controlled by https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/utils.py#L30, which appears to only include nfs and haproxy, so maybe this keepalive only case was missed in that sense. I do think that you could alter the placement of the ingre

[ceph-users] Re: Experience with 100G Ceph in Proxmox

2025-03-25 Thread Chris Palmer
I completely agree that the test I did is not suitable for testing ceph performance. I merely did the same command as the OP and obtained very different results. I suspect the performance difference is much more due to things like network, OS config, memory constraints, etc. But that needs a ri

[ceph-users] Re: Experience with 100G Ceph in Proxmox

2025-03-25 Thread Marc
Sounds weird to me. Don't you have some element in the network that is just limited to 5140 and above it, it starts to fix fragmentation or so. I can remember asking the data center to enable 9000 and they never did and also were experimenting with some software defined network. I will bet th

[ceph-users] Re: Experience with 100G Ceph in Proxmox

2025-03-25 Thread Chris Palmer
No, that was peer-to-peer, controlled testing. The results were different with different NIC chipsets, even on the same machines through the same switch. And even without a switch. I have to say some of these were cheaper NICs. With better ones there are less problems. But you don't know until

[ceph-users] Re: Kafka notification, bad certificate

2025-03-25 Thread Malte Stroem
Hello Frédéric, merci beaucoup. Yes, that's what I saw, too. Thank you for your feedback and acknowledgement. Best, Malte On 20.03.25 10:00, Frédéric Nass wrote: Hi Malte, Yeah, I just wanted to make you aware of this separate Kafka bug in Quincy and Reef v18.2.4. Regarding your issue, if

[ceph-users] Re: Kafka notification, bad certificate

2025-03-25 Thread Malte Stroem
Hello Yuval, yes, I would really like to help here. We're running Reef but can upgrade immediately. Contact me if you need the help. Best, Malte On 22.03.25 18:46, Yuval Lifshitz wrote: Hi, As noted above, I already started implementing mtls support. Currently blocked on adding an mtls test

[ceph-users] Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Eugen Block
Hi, I'm re-evaluating NFS again, testing on a virtual cluster with 18.2.4. For now, I don't need haproxy so I use "keepalive_only: true" as described in the docs [0]. I first create the ingress service, wait for it to start, then create the nfs cluster. I've added the specs at the bottom.

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Malte Stroem
Hi Eugen, try omitting --ingress-mode keepalive-only like this ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" --ingress --virtual_ip "192.168.168.114/24" Best, Malte On 25.03.25 13:25, Eugen Block wrote: Thanks for your quick response. The specs I pasted are actually the

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Eugen Block
Thanks for your quick response. The specs I pasted are actually the result of deploying a nfs cluster like this: ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" --ingress --virtual_ip 192.168.168.114 --ingress-mode keepalive-only I can try redeploying it via dashboard, but I