Hi Eugen, I’m not sure if this helps, and I would greatly appreciate any suggestions for improving our setup, but so far we’ve had good luck with our service deployed using: ceph nfs cluster create cephfs "label:_admin" --ingress --virtual_ip virtual_ip
And then we manually updated the nfs.cephfs service this created to place the nfs daemons on our OSD nodes. This gives us the following: ——— service_type: ingress service_id: nfs.cephfs service_name: ingress.nfs.cephfs placement: label: _admin spec: backend_service: nfs.cephfs first_virtual_router_id: 50 frontend_port: 2049 monitor_port: 9049 virtual_ip: virtual_ip/prefix service_type: nfs service_id: cephfs service_name: nfs.cephfs placement: label: osd spec: port: 12049 ——— Given that we have 5 dedicated management / admin nodes and 5 separate OSD nodes, we then have: ——— [root@cephadmin1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs NAME PORTS RUNNING REFRESHED AGE PLACEMENT ingress.nfs.cephfs virtual_ip:2049,9049 10/10 8m ago 10w label:_admin [root@cephadmin1 ~]# ceph orch ls --service_name=nfs.cephfs NAME PORTS RUNNING REFRESHED AGE PLACEMENT nfs.cephfs ?:12049 5/5 8m ago 7w label:osd ——— At least during testing, failover seemed to work properly. We’re still very new to Ceph, however, so would greatly appreciate knowing if anyone sees any problems with this setup or has suggestions for improvement. For example, we’re still unsure if it would be better to have the ingress.nfs.cephfs and nfs.cephfs services running on the same nodes, if one or both should be running on the dedicated OSD nodes, etc. Thanks! Devin > On Mar 25, 2025, at 11:18 AM, Eugen Block <ebl...@nde.ag> wrote: > > Thanks, Adam. > I just tried it with 3 keepalive daemons and one nfs daemon, it doesn't > really work because all three hosts have the virtual IP assigned, preventing > my client from mounting. So this doesn't really work as a workaround, it > seems. I feel like the proper solution would be to include keepalive in the > list of RESCHEDULE_FROM_OFFLINE_HOSTS_TYPES. > > Zitat von Adam King <adk...@redhat.com>: > >> Which daemons get moved around like that is controlled by >> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/utils.py#L30, >> which appears to only include nfs and haproxy, so maybe this keepalive only >> case was missed in that sense. I do think that you could alter the >> placement of the ingress service to just match all the hosts and it ought >> to work though. The only reason we require specifying a set of hosts with a >> count field lower than the number of hosts matching the placement for nfs >> is that ganesha has no transparent state migration. Keepalive on the other >> hand should work fine with "extra" daemons deployed in order to be highly >> available. >> >> On Tue, Mar 25, 2025 at 10:06 AM Malte Stroem <malte.str...@gmail.com> >> wrote: >> >>> Hi Eugen, >>> >>> yes, for me it's kind of "test-setting" for small setups. >>> >>> Doc says: >>> >>> Setting --ingress-mode keepalive-only deploys a simplified ingress >>> service that provides a virtual IP with the nfs server directly binding >>> to that virtual IP and leaves out any sort of load balancing or traffic >>> redirection. This setup will restrict users to deploying only 1 nfs >>> daemon as multiple cannot bind to the same port on the virtual IP. >>> >>> Best, >>> Malte >>> >>> >>> On 25.03.25 13:46, Eugen Block wrote: >>> > Yeah, it seems to work without the "keepalive-only" flag, at least from >>> > a first test. So keepalive-only is not working properly, it seems? >>> > Should I create a tracker for that or am I misunderstanding its purpose? >>> > >>> > Zitat von Malte Stroem <malte.str...@gmail.com>: >>> > >>> >> Hi Eugen, >>> >> >>> >> try omitting >>> >> >>> >> --ingress-mode keepalive-only >>> >> >>> >> like this >>> >> >>> >> ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" -- >>> >> ingress --virtual_ip "192.168.168.114/24" >>> >> >>> >> Best, >>> >> Malte >>> >> >>> >> On 25.03.25 13:25, Eugen Block wrote: >>> >>> Thanks for your quick response. The specs I pasted are actually the >>> >>> result of deploying a nfs cluster like this: >>> >>> >>> >>> ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" -- >>> >>> ingress --virtual_ip 192.168.168.114 --ingress-mode keepalive-only >>> >>> >>> >>> I can try redeploying it via dashboard, but I don't have a lot of >>> >>> confidence that it will work differently with a failover. >>> >>> >>> >>> Zitat von Malte Stroem <malte.str...@gmail.com>: >>> >>> >>> >>>> Hi Eugen, >>> >>>> >>> >>>> try deploying the NFS service like this: >>> >>>> >>> >>>> https://docs.ceph.com/en/latest/mgr/nfs/ >>> >>>> >>> >>>> Some had only success deploying it via the dashboard. >>> >>>> >>> >>>> Best, >>> >>>> Malte >>> >>>> >>> >>>> On 25.03.25 13:02, Eugen Block wrote: >>> >>>>> Hi, >>> >>>>> >>> >>>>> I'm re-evaluating NFS again, testing on a virtual cluster with >>> >>>>> 18.2.4. For now, I don't need haproxy so I use "keepalive_only: >>> >>>>> true" as described in the docs [0]. I first create the ingress >>> >>>>> service, wait for it to start, then create the nfs cluster. I've >>> >>>>> added the specs at the bottom. >>> >>>>> >>> >>>>> I can mount the export with the virtual ip. Then I just shut down >>> >>>>> the VM where the nfs service was running, the orchestrator >>> >>>>> successfully starts a nfs daemon elsewhere, but the keepalive >>> >>>>> daemon is not failed over. So mounting or accessing the export is >>> >>>>> impossible, of course. And after I power up the offline host again, >>> >>>>> nothing is "repaired", keepalive and nfs run on different servers >>> >>>>> until I intervene manually. This doesn't seem to work as expected, >>> >>>>> is this a known issue (couldn't find anything on tracker)? I have >>> >>>>> my doubts, but maybe it works better with haproxy? Or am I missing >>> >>>>> something in my configuration? >>> >>>>> I haven't tried with a newer release yet. I'd appreciate any >>> comments. >>> >>>>> >>> >>>>> Thanks, >>> >>>>> Eugen >>> >>>>> >>> >>>>> ---snip--- >>> >>>>> service_type: ingress >>> >>>>> service_id: nfs.ebl-nfs-cephfs >>> >>>>> service_name: ingress.nfs.ebl-nfs-cephfs >>> >>>>> placement: >>> >>>>> count: 1 >>> >>>>> hosts: >>> >>>>> - ceph01 >>> >>>>> - ceph02 >>> >>>>> - ceph03 >>> >>>>> spec: >>> >>>>> backend_service: nfs.ebl-nfs-cephfs >>> >>>>> first_virtual_router_id: 50 >>> >>>>> keepalive_only: true >>> >>>>> monitor_port: 9049 >>> >>>>> virtual_ip: 192.168.168.114/24 >>> >>>>> >>> >>>>> >>> >>>>> service_type: nfs >>> >>>>> service_id: ebl-nfs-cephfs >>> >>>>> service_name: nfs.ebl-nfs-cephfs >>> >>>>> placement: >>> >>>>> count: 1 >>> >>>>> hosts: >>> >>>>> - ceph01 >>> >>>>> - ceph02 >>> >>>>> - ceph03 >>> >>>>> spec: >>> >>>>> port: 2049 >>> >>>>> virtual_ip: 192.168.168.114 >>> >>>>> ---snip--- >>> >>>>> >>> >>>>> [0] https://docs.ceph.com/en/reef/cephadm/services/nfs/#nfs-with- >>> >>>>> virtual-ip-but-no-haproxy >>> >>>>> _______________________________________________ >>> >>>>> ceph-users mailing list -- ceph-users@ceph.io >>> >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >>> >>> >>> >>> >>> > >>> > >>> > >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io