On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote:
>
> On 18 Nov 2023, at 15:58, Rick Macklem wrote:
>
> > On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> wrote:
> >>
> >> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote:
> >>>
> >>> On 17 Nov 2023, at 22:14, Mike Karels wrote:
> >>>
> >>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote:
> >>>>
> >>>>> Most of the changes in stable/13 that are not in releng/13.2
> >>>>> are the "make it work in a jail" stuff. Unfortunately, they are
> >>>>> a large # of changes (mostly trivial edits adding vnet macros),
> >>>>> but it also includes export check changes.
> >>>>>
> >>>>> I have attached a trivial patch that I think disables the export
> >>>>> checks for jails. If either of you can try it and see if it fixes
> >>>>> the problem, that would be great.
> >>>>> (Note that this is only for testing, although it probably does not
> >>>>>  matter unless you are running nfsd(8) in vnet jails.)
> >>>>
> >>>> Yes, I can see snapshots with the patch.  This system is just a test
> >>>> system that doesn't normally run ZFS or NFS, so no problem messing
> >>>> with permissions.  It's a bhyve VM, so I just added a small disk and
> >>>> enabled ZFS for testing.
> >>>
> >>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS
> >>> side.  It must be doing something differently inside a snapshot than
> >>> outside, maybe with file handles or something like that.
> >> Yes. I've added freebsd-current@ (although Garrett is not on it, he is
> >> cc'd) and these guys specifically...
> >>
> >> So, here's what appears to be the problem...
> >> Commit 88175af (in main and stable/13, but not 13.2) added checks for
> >> nfsd(8) running in jails by filling in mnt_exjail with a reference to the 
> >> cred
> >> used when the file system is exported.
> >> When mnt_exjail is found NULL, the current nfsd code assumes that there
> >> is no access allowed for the mount.
> >>
> >> My vague understanding is that when a ZFS snapshot is accessed, it is
> >> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that
> >> mnt_exjail is NULL as a result.
> >> Since I do not know the ZFS code and don't even have an easy way to
> >> test this (thankfully Mike can test easily), I do not know what to do from
> >> here?
> >>
> >> Is there a "struct mount" constructed for this pseudo mount
> >> (or it actually appears to be the lookup of ".." that fails, so it
> >> might be the parent of the snapshot subdir?)?
> >>
> >> One thought is that I can check to see if the mount pointer is in the
> >> mountlist (I don't think the snapshot's mount is in the mountlist) and
> >> avoid the jail test for this case.  This would assume that snapshots are
> >> always within the file system(s) exported via that jail (which includes
> >> the case of prison0, of course), so that they do not need a separate
> >> jail check.
> >>
> >> If this doesn't work, there will need to be some sort of messing about
> >> in ZFS to set mnt_exjail for these.
> > Ok, so now onto the hard part...
> > Thanks to Mike and others, I did create a snapshot under .zfs and I can
> > see the problem. It is that mnt_exjail == NULL.
> > Now, is there a way that this "struct mount" can be recognized as "special"
> > for snapshots, so I can avoid the mnt_exjail == NULL test?
> > (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not
> >  the case.)
>
> Dumb question, is the mount point (mp presumably) different between the
> snapshot and the main file system?
Not a dump question and the answer is rather interesting...
It is "sometimes" or "usually" according to my printf().
It seems that when you first "cd <snapshot-name"" you get a different mp
where mnt_exjail == NULL.. Then when you look at directories within the
snapshot, you get the mp of the file system that .zfs exists in, which does
have mnt_exjail set non-NULL.

There is this snippet of code in zfsctl_snapdir_lookup():
/*
* Fix up the root vnode mounted on .zfs/snapshot/<snapname>.
*
* This is where we lie about our v_vfsp in order to
* make .zfs/snapshot/<snapname> accessible over NFS
* without requiring manual mounts of <snapname>.
*/
ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs);
VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs;

/* Clear the root flag (set via VFS_ROOT) as well. */
(*vpp)->v_vflag &= ~VV_ROOT;
which seems to set the mp to that of the parent, but it
seems this does not happen for the initial lookup of
the <snapname>?

I'll note that there is code before this in
zfsctl_snapdir_lookup() for handling cases
like "." and ".." that return without doing this.

Now, why does this work without the mnt_exjail
check (as in 13.2)?
I am not quite sure, but there is this "cheat" in the
NFS server (it has been there for years, maybe decades):
    /*
     * Allow a Lookup, Getattr, GetFH, Secinfo on an
     * non-exported directory if
     * nfs_rootfhset. Do I need to allow any other Ops?
     * (You can only have a non-exported vpnes if
     *  nfs_rootfhset is true. See nfsd_fhtovp())
     * Allow AUTH_SYS to be used for file systems
     * exported GSS only for certain Ops, to allow
     * clients to do mounts more easily.
     */
    if (nfsv4_opflag[op].needscfh && vp) {
if (!NFSVNO_EXPORTED(&vpnes) &&
    op != NFSV4OP_LOOKUP &&
    op != NFSV4OP_GETATTR &&
    op != NFSV4OP_GETFH &&
    op != NFSV4OP_ACCESS &&
    op != NFSV4OP_READLINK &&
    op != NFSV4OP_SECINFO &&
    op != NFSV4OP_SECINFONONAME)
nd->nd_repstat = NFSERR_NOFILEHANDLE;
This allows certain operations to be done on
non-exported file systems and I think that is enough
to allow this to work when mnt_exjail is not checked.
(Note that NFSV4OP_LOOKUPP is not in the list,
 which might explain why it is the one that fails for
 Garrett. I don't think it can be added to this list
 safely, since that would allow a client to move above
 the exported file system into "uncharted territory".)

>  Just curious.  Also, what is mnt_exjail
> normally set to for file systems not in a jail?
mnt_exjail is set to the credentials of the thread/process
that exported the file system (usually mountd(8)).
When not in a jail, cr_prison for these credentials
points to prison0.

Btw, I checked and the "other mp that has mnt_exjail == NULL
is in the mountlist, so the idea of checking "not in mountlist"
is a dead end.

I am looking for something "unique" about this other mp,
but haven't found anything yet.
Alternately, it might be necessary to add code to
zfsctl_snapdir_lookup() to "cheat and change the mp"
in more cases, such as "." and ".." lookups?

rick
ps: I added all the cc's back in because I want the
      ZFS folk to hopefully chime in.

>
>                 Mike
>
> > Do I need to search mountlist for it?
> >
> > rick
> > ps: The hack patch attached should fix the problem, but can only be
> >       safely used if mountd/nfsd are not run in any jails.
> >
> >>
> >> I will try and get a test setup going here, which leads me to..
> >> how do I create a ZFS snapshot? (I do have a simple ZFS pool running
> >> on a test machine, but I've never done a snapshot.)
> >>
> >> Although this problem is not in 13.2, it will have shipped in 14.0.
> >>
> >> Any help with be appreciated, rick
> >>
> >>>
> >>>                 Mike
> >>>>
> >>>>> rick
> >>>>>
> >>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote:
> >>>>>>
> >>>>>> CAUTION: This email originated from outside of the University of 
> >>>>>> Guelph. Do not click links or open attachments unless you recognize 
> >>>>>> the sender and know the content is safe. If in doubt, forward 
> >>>>>> suspicious emails to ith...@uoguelph.ca.
> >>>>>>
> >>>>>>
> >>>>>> Rick, have you been following this thread on freebsd-stable?  I have 
> >>>>>> been able
> >>>>>> to reproduce this using a 13-stable server from Oct 7 and a 15-current 
> >>>>>> system
> >>>>>> that is up to date using NFSv3.  I did not reproduce with a 13.2 
> >>>>>> server.  The
> >>>>>> client was running 13.2.  Any ideas?  A full bisect seems fairly 
> >>>>>> painful, but
> >>>>>> maybe you have an idea of points to try.  Fortunately, these are all 
> >>>>>> test
> >>>>>> systems that I can reboot at will.
> >>>>>>
> >>>>>>                 Mike
> >>>>>>
> >>>>>> Forwarded message:
> >>>>>>
> >>>>>>> From: Garrett Wollman <woll...@bimajority.org>
> >>>>>>> To: Mike Karels <m...@karels.net>
> >>>>>>> Cc: freebsd-sta...@freebsd.org
> >>>>>>> Subject: Re: NFS exports of ZFS snapshots broken
> >>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500
> >>>>>>>
> >>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> 
> >>>>>>> said:
> >>>>>>>
> >>>>>>>> I have not run into this, so I tried it just now.  I had no problem.
> >>>>>>>> The server is 13.2, fully patched, the client is up-to-date -current,
> >>>>>>>> and the mount is v4.
> >>>>>>>
> >>>>>>> On my 13.2 client and 13-stable server, I see:
> >>>>>>>
> >>>>>>>  25034 ls       CALL  
> >>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>)
> >>>>>>>  25034 ls       NAMI  "/mnt/tools/.zfs/snapshot/weekly-2023-45"
> >>>>>>>  25034 ls       RET   open 4
> >>>>>>>  25034 ls       CALL  fcntl(0x4,F_ISUNIONSTACK,0x0)
> >>>>>>>  25034 ls       RET   fcntl 0
> >>>>>>>  25034 ls       CALL  
> >>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028)
> >>>>>>>  25034 ls       RET   getdirentries -1 errno 5 Input/output error
> >>>>>>>  25034 ls       CALL  close(0x4)
> >>>>>>>  25034 ls       RET   close 0
> >>>>>>>  25034 ls       CALL  exit(0)
> >>>>>>>
> >>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO]
> >>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is
> >>>>>>> consistent across both FreeBSD and Linux clients.
> >>>>>>>
> >>>>>>> Looking at this from the RPC side:
> >>>>>>>
> >>>>>>>       (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR)
> >>>>>>>               [NFS4_OK for all ops]
> >>>>>>>       (PUTFH, GETATTR)
> >>>>>>>               [NFS4_OK, NFS4_OK]
> >>>>>>>       (PUTFH, ACCESS(0x3f), GETATTR)
> >>>>>>>               [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK]
> >>>>>>>       (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR)
> >>>>>>>               [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE]
> >>>>>>>
> >>>>>>> and at this point the [EIO] is returned.
> >>>>>>>
> >>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, and
> >>>>>>> this is failing when the subject file handle is the snapshot.  The
> >>>>>>> client is perfectly able to *traverse into* the snapshot: if I try to
> >>>>>>> list a subdirectory I know exists in the snapshot, the client is able 
> >>>>>>> to
> >>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with
> >>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*.
> >>>>>>>
> >>>>>>> -GAWollman
> >>>>>>

Reply via email to