On Sat, Jan 7, 2023 at 6:04 PM Benjamin Kaduk <bjkf...@gmail.com> wrote: > > CAUTION: This email originated from outside of the University of Guelph. Do > not click links or open attachments unless you recognize the sender and know > the content is safe. If in doubt, forward suspicious emails to > ith...@uoguelph.ca > > On Sat, Jan 7, 2023 at 1:50 PM Rick Macklem <rmack...@freebsd.org> wrote: >> >> The branch main has been updated by rmacklem: >> >> URL: >> https://cgit.FreeBSD.org/src/commit/?id=c33509d49a6fdcf86ef280a78f428d3cb7012c4a >> >> commit c33509d49a6fdcf86ef280a78f428d3cb7012c4a >> Author: Rick Macklem <rmack...@freebsd.org> >> AuthorDate: 2023-01-07 21:49:25 +0000 >> Commit: Rick Macklem <rmack...@freebsd.org> >> CommitDate: 2023-01-07 21:49:25 +0000 >> >> gssd: Fix handling of the gssname=<name> NFS mount option >> >> If an NFS mount using "sec=krb5[ip],gssname=<name>" is >> done, the gssd daemon fails. There is a long delay >> (several seconds) in the gss_acquire_cred() call and then >> it returns success, but the credentials returned are >> junk. >> >> I have no idea how long this has been broken, due to some >> change in the Heimdal gssapi library call, but I suspect >> it has been quite some time. >> >> Anyhow, it turns out that replacing the "desired_name" >> argument with GSS_C_NO_NAME fixes the problem. >> Replacing the argument should not be a problem, since the >> TGT for the host based initiator credential in the default >> keytab file should be the only TGT in the gssd'd credential >> cache (which is not the one for uid 0). >> >> I will try and determine if FreeBSD13 and/or FreeBSD12 >> needs this same fix and will MFC if they need the fix. >> >> This problem only affected Kerberized NFS mounts when the >> "gssname" mount option was used. Other Kerberized NFS >> mount cases already used GSS_C_NO_NAME and work ok. >> A workaround if you do not have this patch is to do a >> "kinit -k host/FQDN" as root on the machine, followed by >> the Kerberized NFS mount without the "gssname" mount >> option. >> > > > Hi Rick, > > This doesn't seem like a good long-term fix. > If we're going to have a gssname argument, we should actually make > it take effect, rather than silently ignoring it, which is what using > GSS_C_NO_NAME > does (it indicates the use of "any credential", which ends up meaning the > default credential when used on a GSS initiator). > > It should be possible to inspect the "junk" credential from gss_acquire_cred() > and learn more about what happened (perhaps a non-kerberos mechanismm was > picked, or the name was in the wrong format) using various gss_inquire_*() > calls, > as a diagnostic measure. Unfortunately I don't anticipate having a huge > amount of time > to put into it anytime soon... > I found the underlying problem. The upcall RPC from the kernel was timing out at 25sec and the gss_acquire_cred() call was not done at that time. (It was close. gss_acquire_cred() took about 27sec.) Then the kernel code would assume that the gssd(8) daemon had gone away and closed the upcall socket. This made the gssd(8) daemon to terminate, due to a SIGPIPE signal.
Increasing the timeout makes it work. I am now "on the fence" w.r.t. leaving this patch in. As I noted, I think it is safe to do, since the credential cache used by the gssd(8) daemon should only have a TGT for the host-based client credential. Without the patch, the mount takes almost 30sec instead of a fraction of a second with the patch (assuming the timeout has been increased, which turns out to be needed for the case where a user's TGT has expired and they attempt to access the mount). If you really think it should be reverted, I can do that. Thanks for your comments, rick ps: I will be committing a change to increase the timeout. > Thanks, > > Ben