[ceph-users] Re: Help with HA NFS

Devin A. Bougie Tue, 22 Apr 2025 16:42:07 -0700

Thanks, Alexander!  We’re running fully-updated AlmaLinux 9.5 on both the 
servers and the clients.


I thought we’d give the Ceph NFS service a try, but certainly have more 
experience with pacemaker / corosync (and standalone NFS servers).  I guess 
we’ll go that route unless anyone else has any ideas.

And just one more quick update, after some offline exchanges we removed the 
count limit and now have multiple ingress.nfs.cephfs service instances.  That 
hasn’t changed the behavior, however, WRT losing one of the backend nfs.cephfs 
daemons.

———
[root@cephman1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs --export
service_type: ingress
service_id: nfs.cephfs
service_name: ingress.nfs.cephfs
placement:
  label: _admin
spec:
  backend_service: nfs.cephfs
  first_virtual_router_id: 50
  frontend_port: 2049
  monitor_port: 9049
  virtual_ip: virtual_ip/prefix

[root@cephman1 ~]# ceph orch ls --service_name=nfs.cephfs --export
service_type: nfs
service_id: cephfs
service_name: nfs.cephfs
placement:
  label: _admin
spec:
  port: 12049
———

Thanks again,
Devin

> On Apr 22, 2025, at 7:33 PM, Alexander Patrakov <patra...@gmail.com> wrote:
> 
> Hello Devin,
> 
> An important additional detail is missing: which OS is used as a client?
> 
> And yes, my default recommendation would be to move the NFS server out
> of the Ceph cluster.
> 
> On Wed, Apr 23, 2025 at 6:29 AM Devin A. Bougie
> <devin.bou...@cornell.edu> wrote:
>> 
>> Hello,
>> 
>> We’ve found that if we lose one of the nfs.cephfs service daemons in our 
>> cephadm 19.2.2 cluster, all NFS traffic is blocked until either:
>> - the down nfs.cephfs daemon is restarted
>> - or we reconfigure the placement of the nfs.cephs service to not use the 
>> affected host.  After this, the ingress.nfs.cephfs service is automatically 
>> reconfigured and everything resumes
>> 
>> Our current setup follows the "HIGH-AVAILABILITY NFS” documentation, which 
>> gives us an ingress.nfs.cephfs service with the haproxy and keepalived 
>> daemons and an nfs.cephfs service for the actual nfs daemons.  This service 
>> was deployed using:
>> ceph nfs cluster create cephfs "label:_admin" --ingress --virtual_ip 
>> virtual_ip
>> 
>> And then we updated the ingress.nfs.cephfs service to only deploy a single 
>> service (which in this case, results in two daemons on a single host).
>> 
>> This gives us the following:
>> ———
>> [root@cephman1 ~]# ceph orch ls --service_name=ingress.nfs.cephfs --export
>> service_type: ingress
>> service_id: nfs.cephfs
>> service_name: ingress.nfs.cephfs
>> placement:
>>  count: 1
>>  label: _admin
>> spec:
>>  backend_service: nfs.cephfs
>>  first_virtual_router_id: 50
>>  frontend_port: 2049
>>  monitor_port: 9049
>>  virtual_ip: virtual_ip/prefix
>> 
>> [root@cephman1 ~]# ceph orch ls --service_name=nfs.cephfs --export
>> service_type: nfs
>> service_id: cephfs
>> service_name: nfs.cephfs
>> placement:
>>  label: _admin
>> spec:
>>  port: 12049
>> ———
>> 
>> Can anyone show us the config for a true “HA” nfs service where they can 
>> lose any single host without impacting access to the NFS export from 
>> clients?  I would expect to be able to lose the host running the 
>> ingress.nfs.cephfs service, and have it automatically restarted on a 
>> different host.  Likewise, I would expect to be able to lose an nfs.cephs 
>> daemon without impacting access to the export.
>> 
>> Or should we be taking a completely different approach and move our NFS 
>> service out of Ceph and into our pacemaker / corosync cluster?
>> 
>> Sorry if this sounds redundant to questions I’ve previously asked, but we’ve 
>> reconfigured things a little and it feels like we’re getting closer with 
>> each attempt?
>> 
>> Many thanks,
>> Devin
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 
> -- 
> Alexander Patrakov

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help with HA NFS

Reply via email to