Hello Devin, An important additional detail is missing: which OS is used as a client?
And yes, my default recommendation would be to move the NFS server out of the Ceph cluster. On Wed, Apr 23, 2025 at 6:29 AM Devin A. Bougie <devin.bou...@cornell.edu> wrote: > > Hello, > > We’ve found that if we lose one of the nfs.cephfs service daemons in our > cephadm 19.2.2 cluster, all NFS traffic is blocked until either: > - the down nfs.cephfs daemon is restarted > - or we reconfigure the placement of the nfs.cephs service to not use the > affected host. After this, the ingress.nfs.cephfs service is automatically > reconfigured and everything resumes > > Our current setup follows the "HIGH-AVAILABILITY NFS” documentation, which > gives us an ingress.nfs.cephfs service with the haproxy and keepalived > daemons and an nfs.cephfs service for the actual nfs daemons. This service > was deployed using: > ceph nfs cluster create cephfs "label:_admin" --ingress --virtual_ip > virtual_ip > > And then we updated the ingress.nfs.cephfs service to only deploy a single > service (which in this case, results in two daemons on a single host). > > This gives us the following: > ——— > [root@cephman1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs --export > service_type: ingress > service_id: nfs.cephfs > service_name: ingress.nfs.cephfs > placement: > count: 1 > label: _admin > spec: > backend_service: nfs.cephfs > first_virtual_router_id: 50 > frontend_port: 2049 > monitor_port: 9049 > virtual_ip: 128.84.45.48/22 > > [root@cephman1 ~]# ceph orch ls --service_name=nfs.cephfs --export > service_type: nfs > service_id: cephfs > service_name: nfs.cephfs > placement: > label: _admin > spec: > port: 12049 > ——— > > Can anyone show us the config for a true “HA” nfs service where they can lose > any single host without impacting access to the NFS export from clients? I > would expect to be able to lose the host running the ingress.nfs.cephfs > service, and have it automatically restarted on a different host. Likewise, > I would expect to be able to lose an nfs.cephs daemon without impacting > access to the export. > > Or should we be taking a completely different approach and move our NFS > service out of Ceph and into our pacemaker / corosync cluster? > > Sorry if this sounds redundant to questions I’ve previously asked, but we’ve > reconfigured things a little and it feels like we’re getting closer with each > attempt? > > Many thanks, > Devin > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io