Hi Devin,

- It takes a minimum of 2 hosts, and we use these specifications to get a working NFS HA cluster (keepalived_VRRP + HAproxy balancer) :

---------------------------
service_type: ingress
[...]
placement:
  count: 2
  [...]
spec:
  enable_haproxy_protocol: true
  [...]
---------------------------

with

---------------------------
service_type: nfs
[...]
placement:
  count: 2
  [...]
spec:
  enable_haproxy_protocol: true
  [...]
---------------------------


- To use/test only the VIP without balancing :
replace "enable_haproxy_protocol: true" by "keepalive_only: true" in the ingress service
 remove "enable_haproxy_protocol: true" in the nfs service


Yann



Le 23/04/2025 à 00:19, Devin A. Bougie a écrit :
Hello,

We’ve found that if we lose one of the nfs.cephfs service daemons in our 
cephadm 19.2.2 cluster, all NFS traffic is blocked until either:
- the down nfs.cephfs daemon is restarted
- or we reconfigure the placement of the nfs.cephs service to not use the 
affected host.  After this, the ingress.nfs.cephfs service is automatically 
reconfigured and everything resumes

Our current setup follows the "HIGH-AVAILABILITY NFS” documentation, which 
gives us an ingress.nfs.cephfs service with the haproxy and keepalived daemons and 
an nfs.cephfs service for the actual nfs daemons.  This service was deployed using:
ceph nfs cluster create cephfs "label:_admin" --ingress --virtual_ip virtual_ip

And then we updated the ingress.nfs.cephfs service to only deploy a single 
service (which in this case, results in two daemons on a single host).

This gives us the following:
———
[root@cephman1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs --export
service_type: ingress
service_id: nfs.cephfs
service_name: ingress.nfs.cephfs
placement:
   count: 1
   label: _admin
spec:
   backend_service: nfs.cephfs
   first_virtual_router_id: 50
   frontend_port: 2049
   monitor_port: 9049
   virtual_ip: 128.84.45.48/22

[root@cephman1 ~]# ceph orch ls --service_name=nfs.cephfs --export
service_type: nfs
service_id: cephfs
service_name: nfs.cephfs
placement:
   label: _admin
spec:
   port: 12049
———

Can anyone show us the config for a true “HA” nfs service where they can lose 
any single host without impacting access to the NFS export from clients?  I 
would expect to be able to lose the host running the ingress.nfs.cephfs 
service, and have it automatically restarted on a different host.  Likewise, I 
would expect to be able to lose an nfs.cephs daemon without impacting access to 
the export.

Or should we be taking a completely different approach and move our NFS service 
out of Ceph and into our pacemaker / corosync cluster?

Sorry if this sounds redundant to questions I’ve previously asked, but we’ve 
reconfigured things a little and it feels like we’re getting closer with each 
attempt?

Many thanks,
Devin
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to