[ceph-users] Re: Reef: highly-available NFS with keepalive_only

Adam King Tue, 25 Mar 2025 10:21:23 -0700

>
> I just tried it with 3 keepalive daemons and one nfs daemon, it
> doesn't really work because all three hosts have the virtual IP
> assigned, preventing my client from mounting. So this doesn't really
> work as a workaround, it seems.



That's a bit surprising. The keepalive daemons are meant to communicate to
each other and have only one maintain the VIP while the others are in a
standby state. Do you see anything in the logs about them communicating to
each other? Also, in the keepalive conf for each keepalive, do you see one
set with a higher "priority" setting than the other?

On Tue, Mar 25, 2025 at 11:18 AM Eugen Block <ebl...@nde.ag> wrote:

> Thanks, Adam.
> I just tried it with 3 keepalive daemons and one nfs daemon, it
> doesn't really work because all three hosts have the virtual IP
> assigned, preventing my client from mounting. So this doesn't really
> work as a workaround, it seems. I feel like the proper solution would
> be to include keepalive in the list of
> RESCHEDULE_FROM_OFFLINE_HOSTS_TYPES.
>
> Zitat von Adam King <adk...@redhat.com>:
>
> > Which daemons get moved around like that is controlled by
> >
> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/utils.py#L30
> ,
> > which appears to only include nfs and haproxy, so maybe this keepalive
> only
> > case was missed in that sense. I do think that you could alter the
> > placement of the ingress service to just match all the hosts and it ought
> > to work though. The only reason we require specifying a set of hosts
> with a
> > count field lower than the number of hosts matching the placement for nfs
> > is that ganesha has no transparent state migration. Keepalive on the
> other
> > hand should work fine with "extra" daemons deployed in order to be highly
> > available.
> >
> > On Tue, Mar 25, 2025 at 10:06 AM Malte Stroem <malte.str...@gmail.com>
> > wrote:
> >
> >> Hi Eugen,
> >>
> >> yes, for me it's kind of "test-setting" for small setups.
> >>
> >> Doc says:
> >>
> >> Setting --ingress-mode keepalive-only deploys a simplified ingress
> >> service that provides a virtual IP with the nfs server directly binding
> >> to that virtual IP and leaves out any sort of load balancing or traffic
> >> redirection. This setup will restrict users to deploying only 1 nfs
> >> daemon as multiple cannot bind to the same port on the virtual IP.
> >>
> >> Best,
> >> Malte
> >>
> >>
> >> On 25.03.25 13:46, Eugen Block wrote:
> >> > Yeah, it seems to work without the "keepalive-only" flag, at least
> from
> >> > a first test. So keepalive-only is not working properly, it seems?
> >> > Should I create a tracker for that or am I misunderstanding its
> purpose?
> >> >
> >> > Zitat von Malte Stroem <malte.str...@gmail.com>:
> >> >
> >> >> Hi Eugen,
> >> >>
> >> >> try omitting
> >> >>
> >> >> --ingress-mode keepalive-only
> >> >>
> >> >> like this
> >> >>
> >> >> ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" --
> >> >> ingress --virtual_ip "192.168.168.114/24"
> >> >>
> >> >> Best,
> >> >> Malte
> >> >>
> >> >> On 25.03.25 13:25, Eugen Block wrote:
> >> >>> Thanks for your quick response. The specs I pasted are actually the
> >> >>> result of deploying a nfs cluster like this:
> >> >>>
> >> >>> ceph nfs cluster create ebl-nfs-cephfs "1 ceph01 ceph02 ceph03" --
> >> >>> ingress --virtual_ip 192.168.168.114 --ingress-mode keepalive-only
> >> >>>
> >> >>> I can try redeploying it via dashboard, but I don't have a lot of
> >> >>> confidence that it will work differently with a failover.
> >> >>>
> >> >>> Zitat von Malte Stroem <malte.str...@gmail.com>:
> >> >>>
> >> >>>> Hi Eugen,
> >> >>>>
> >> >>>> try deploying the NFS service like this:
> >> >>>>
> >> >>>> https://docs.ceph.com/en/latest/mgr/nfs/
> >> >>>>
> >> >>>> Some had only success deploying it via the dashboard.
> >> >>>>
> >> >>>> Best,
> >> >>>> Malte
> >> >>>>
> >> >>>> On 25.03.25 13:02, Eugen Block wrote:
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I'm re-evaluating NFS again, testing on a virtual cluster with
> >> >>>>> 18.2.4. For now, I don't need haproxy so I use "keepalive_only:
> >> >>>>> true" as described in the docs [0]. I first create the ingress
> >> >>>>> service, wait for it to start, then create the nfs cluster. I've
> >> >>>>> added the specs at the bottom.
> >> >>>>>
> >> >>>>> I can mount the export with the virtual ip. Then I just shut down
> >> >>>>> the VM where the nfs service was running, the orchestrator
> >> >>>>> successfully starts a nfs daemon elsewhere, but the keepalive
> >> >>>>> daemon is not failed over. So mounting or accessing the export is
> >> >>>>> impossible, of course. And after I power up the offline host
> again,
> >> >>>>> nothing is "repaired", keepalive and nfs run on different servers
> >> >>>>> until I intervene manually. This doesn't seem to work as expected,
> >> >>>>> is this a known issue (couldn't find anything on tracker)? I have
> >> >>>>> my doubts, but maybe it works better with haproxy? Or am I missing
> >> >>>>> something in my configuration?
> >> >>>>> I haven't tried with a newer release yet. I'd appreciate any
> >> comments.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Eugen
> >> >>>>>
> >> >>>>> ---snip---
> >> >>>>> service_type: ingress
> >> >>>>> service_id: nfs.ebl-nfs-cephfs
> >> >>>>> service_name: ingress.nfs.ebl-nfs-cephfs
> >> >>>>> placement:
> >> >>>>>   count: 1
> >> >>>>>   hosts:
> >> >>>>>   - ceph01
> >> >>>>>   - ceph02
> >> >>>>>   - ceph03
> >> >>>>> spec:
> >> >>>>>   backend_service: nfs.ebl-nfs-cephfs
> >> >>>>>   first_virtual_router_id: 50
> >> >>>>>   keepalive_only: true
> >> >>>>>   monitor_port: 9049
> >> >>>>>   virtual_ip: 192.168.168.114/24
> >> >>>>>
> >> >>>>>
> >> >>>>> service_type: nfs
> >> >>>>> service_id: ebl-nfs-cephfs
> >> >>>>> service_name: nfs.ebl-nfs-cephfs
> >> >>>>> placement:
> >> >>>>>   count: 1
> >> >>>>>   hosts:
> >> >>>>>   - ceph01
> >> >>>>>   - ceph02
> >> >>>>>   - ceph03
> >> >>>>> spec:
> >> >>>>>   port: 2049
> >> >>>>>   virtual_ip: 192.168.168.114
> >> >>>>> ---snip---
> >> >>>>>
> >> >>>>> [0] https://docs.ceph.com/en/reef/cephadm/services/nfs/#nfs-with-
> >> >>>>> virtual-ip-but-no-haproxy
> >> >>>>> _______________________________________________
> >> >>>>> ceph-users mailing list -- ceph-users@ceph.io
> >> >>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >>>
> >> >>>
> >> >>>
> >> >
> >> >
> >> >
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

Reply via email to