Hello Devin,

An important additional detail is missing: which OS is used as a client?

And yes, my default recommendation would be to move the NFS server out
of the Ceph cluster.

On Wed, Apr 23, 2025 at 6:29 AM Devin A. Bougie
<devin.bou...@cornell.edu> wrote:
>
> Hello,
>
> We’ve found that if we lose one of the nfs.cephfs service daemons in our 
> cephadm 19.2.2 cluster, all NFS traffic is blocked until either:
> - the down nfs.cephfs daemon is restarted
> - or we reconfigure the placement of the nfs.cephs service to not use the 
> affected host.  After this, the ingress.nfs.cephfs service is automatically 
> reconfigured and everything resumes
>
> Our current setup follows the "HIGH-AVAILABILITY NFS” documentation, which 
> gives us an ingress.nfs.cephfs service with the haproxy and keepalived 
> daemons and an nfs.cephfs service for the actual nfs daemons.  This service 
> was deployed using:
> ceph nfs cluster create cephfs "label:_admin" --ingress --virtual_ip 
> virtual_ip
>
> And then we updated the ingress.nfs.cephfs service to only deploy a single 
> service (which in this case, results in two daemons on a single host).
>
> This gives us the following:
> ———
> [root@cephman1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs --export
> service_type: ingress
> service_id: nfs.cephfs
> service_name: ingress.nfs.cephfs
> placement:
>   count: 1
>   label: _admin
> spec:
>   backend_service: nfs.cephfs
>   first_virtual_router_id: 50
>   frontend_port: 2049
>   monitor_port: 9049
>   virtual_ip: 128.84.45.48/22
>
> [root@cephman1 ~]# ceph orch ls --service_name=nfs.cephfs --export
> service_type: nfs
> service_id: cephfs
> service_name: nfs.cephfs
> placement:
>   label: _admin
> spec:
>   port: 12049
> ———
>
> Can anyone show us the config for a true “HA” nfs service where they can lose 
> any single host without impacting access to the NFS export from clients?  I 
> would expect to be able to lose the host running the ingress.nfs.cephfs 
> service, and have it automatically restarted on a different host.  Likewise, 
> I would expect to be able to lose an nfs.cephs daemon without impacting 
> access to the export.
>
> Or should we be taking a completely different approach and move our NFS 
> service out of Ceph and into our pacemaker / corosync cluster?
>
> Sorry if this sounds redundant to questions I’ve previously asked, but we’ve 
> reconfigured things a little and it feels like we’re getting closer with each 
> attempt?
>
> Many thanks,
> Devin
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Alexander Patrakov
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to