On Sat, Nov 18, 2023 at 4:43 PM Mike Karels <m...@karels.net> wrote:
>
> On 18 Nov 2023, at 17:23, Rick Macklem wrote:
>
> > On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote:
> >>
> >> On 18 Nov 2023, at 15:58, Rick Macklem wrote:
> >>
> >>> On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> 
> >>> wrote:
> >>>>
> >>>> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote:
> >>>>>
> >>>>> On 17 Nov 2023, at 22:14, Mike Karels wrote:
> >>>>>
> >>>>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote:
> >>>>>>
> >>>>>>> Most of the changes in stable/13 that are not in releng/13.2
> >>>>>>> are the "make it work in a jail" stuff. Unfortunately, they are
> >>>>>>> a large # of changes (mostly trivial edits adding vnet macros),
> >>>>>>> but it also includes export check changes.
> >>>>>>>
> >>>>>>> I have attached a trivial patch that I think disables the export
> >>>>>>> checks for jails. If either of you can try it and see if it fixes
> >>>>>>> the problem, that would be great.
> >>>>>>> (Note that this is only for testing, although it probably does not
> >>>>>>>  matter unless you are running nfsd(8) in vnet jails.)
> >>>>>>
> >>>>>> Yes, I can see snapshots with the patch.  This system is just a test
> >>>>>> system that doesn't normally run ZFS or NFS, so no problem messing
> >>>>>> with permissions.  It's a bhyve VM, so I just added a small disk and
> >>>>>> enabled ZFS for testing.
> >>>>>
> >>>>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS
> >>>>> side.  It must be doing something differently inside a snapshot than
> >>>>> outside, maybe with file handles or something like that.
> >>>> Yes. I've added freebsd-current@ (although Garrett is not on it, he is
> >>>> cc'd) and these guys specifically...
> >>>>
> >>>> So, here's what appears to be the problem...
> >>>> Commit 88175af (in main and stable/13, but not 13.2) added checks for
> >>>> nfsd(8) running in jails by filling in mnt_exjail with a reference to 
> >>>> the cred
> >>>> used when the file system is exported.
> >>>> When mnt_exjail is found NULL, the current nfsd code assumes that there
> >>>> is no access allowed for the mount.
> >>>>
> >>>> My vague understanding is that when a ZFS snapshot is accessed, it is
> >>>> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that
> >>>> mnt_exjail is NULL as a result.
> >>>> Since I do not know the ZFS code and don't even have an easy way to
> >>>> test this (thankfully Mike can test easily), I do not know what to do 
> >>>> from
> >>>> here?
> >>>>
> >>>> Is there a "struct mount" constructed for this pseudo mount
> >>>> (or it actually appears to be the lookup of ".." that fails, so it
> >>>> might be the parent of the snapshot subdir?)?
> >>>>
> >>>> One thought is that I can check to see if the mount pointer is in the
> >>>> mountlist (I don't think the snapshot's mount is in the mountlist) and
> >>>> avoid the jail test for this case.  This would assume that snapshots are
> >>>> always within the file system(s) exported via that jail (which includes
> >>>> the case of prison0, of course), so that they do not need a separate
> >>>> jail check.
> >>>>
> >>>> If this doesn't work, there will need to be some sort of messing about
> >>>> in ZFS to set mnt_exjail for these.
> >>> Ok, so now onto the hard part...
> >>> Thanks to Mike and others, I did create a snapshot under .zfs and I can
> >>> see the problem. It is that mnt_exjail == NULL.
> >>> Now, is there a way that this "struct mount" can be recognized as 
> >>> "special"
> >>> for snapshots, so I can avoid the mnt_exjail == NULL test?
> >>> (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not
> >>>  the case.)
> >>
> >> Dumb question, is the mount point (mp presumably) different between the
> >> snapshot and the main file system?
> > Not a dump question and the answer is rather interesting...
> > It is "sometimes" or "usually" according to my printf().
> > It seems that when you first "cd <snapshot-name"" you get a different mp
> > where mnt_exjail == NULL.. Then when you look at directories within the
> > snapshot, you get the mp of the file system that .zfs exists in, which does
> > have mnt_exjail set non-NULL.
> >
> > There is this snippet of code in zfsctl_snapdir_lookup():
> > /*
> > * Fix up the root vnode mounted on .zfs/snapshot/<snapname>.
> > *
> > * This is where we lie about our v_vfsp in order to
> > * make .zfs/snapshot/<snapname> accessible over NFS
> > * without requiring manual mounts of <snapname>.
> > */
> > ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs);
> > VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs;
> >
> > /* Clear the root flag (set via VFS_ROOT) as well. */
> > (*vpp)->v_vflag &= ~VV_ROOT;
> > which seems to set the mp to that of the parent, but it
> > seems this does not happen for the initial lookup of
> > the <snapname>?
> >
> > I'll note that there is code before this in
> > zfsctl_snapdir_lookup() for handling cases
> > like "." and ".." that return without doing this.
> >
> > Now, why does this work without the mnt_exjail
> > check (as in 13.2)?
> > I am not quite sure, but there is this "cheat" in the
> > NFS server (it has been there for years, maybe decades):
> >     /*
> >      * Allow a Lookup, Getattr, GetFH, Secinfo on an
> >      * non-exported directory if
> >      * nfs_rootfhset. Do I need to allow any other Ops?
> >      * (You can only have a non-exported vpnes if
> >      *  nfs_rootfhset is true. See nfsd_fhtovp())
> >      * Allow AUTH_SYS to be used for file systems
> >      * exported GSS only for certain Ops, to allow
> >      * clients to do mounts more easily.
> >      */
> >     if (nfsv4_opflag[op].needscfh && vp) {
> > if (!NFSVNO_EXPORTED(&vpnes) &&
> >     op != NFSV4OP_LOOKUP &&
> >     op != NFSV4OP_GETATTR &&
> >     op != NFSV4OP_GETFH &&
> >     op != NFSV4OP_ACCESS &&
> >     op != NFSV4OP_READLINK &&
> >     op != NFSV4OP_SECINFO &&
> >     op != NFSV4OP_SECINFONONAME)
> > nd->nd_repstat = NFSERR_NOFILEHANDLE;
> > This allows certain operations to be done on
> > non-exported file systems and I think that is enough
> > to allow this to work when mnt_exjail is not checked.
> > (Note that NFSV4OP_LOOKUPP is not in the list,
> >  which might explain why it is the one that fails for
> >  Garrett. I don't think it can be added to this list
> >  safely, since that would allow a client to move above
> >  the exported file system into "uncharted territory".)
> >
> >>  Just curious.  Also, what is mnt_exjail
> >> normally set to for file systems not in a jail?
> > mnt_exjail is set to the credentials of the thread/process
> > that exported the file system (usually mountd(8)).
> > When not in a jail, cr_prison for these credentials
> > points to prison0.
> >
> > Btw, I checked and the "other mp that has mnt_exjail == NULL
> > is in the mountlist, so the idea of checking "not in mountlist"
> > is a dead end.
> >
> > I am looking for something "unique" about this other mp,
> > but haven't found anything yet.
> > Alternately, it might be necessary to add code to
> > zfsctl_snapdir_lookup() to "cheat and change the mp"
> > in more cases, such as "." and ".." lookups?
>
> It seems to me that if ZFS is creating an additional mount structure,
> it should be responsible for setting it up correctly.  That could
> involve a vfs-level routine to do some of the cloning.  In any case,
> it seems to me that mnt_exjail should be set properly, e.g. by duping
> the one in the original mount structure.  Probably ZFS is the only
> file system type that would need this added.
I've created a patch that I think does this. It seemed to test ok for my case.
It's in D42672 on reviews.freebsd.org.
I have also attached it here (this diff doesn't use -U999999, so it might be
easier to apply).

Hopefully this fixes the problem. Sorry for the breakage.

rick

>
>                 Mike
>
> > rick
> > ps: I added all the cc's back in because I want the
> >       ZFS folk to hopefully chime in.
> >
> >>
> >>                 Mike
> >>
> >>> Do I need to search mountlist for it?
> >>>
> >>> rick
> >>> ps: The hack patch attached should fix the problem, but can only be
> >>>       safely used if mountd/nfsd are not run in any jails.
> >>>
> >>>>
> >>>> I will try and get a test setup going here, which leads me to..
> >>>> how do I create a ZFS snapshot? (I do have a simple ZFS pool running
> >>>> on a test machine, but I've never done a snapshot.)
> >>>>
> >>>> Although this problem is not in 13.2, it will have shipped in 14.0.
> >>>>
> >>>> Any help with be appreciated, rick
> >>>>
> >>>>>
> >>>>>                 Mike
> >>>>>>
> >>>>>>> rick
> >>>>>>>
> >>>>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote:
> >>>>>>>>
> >>>>>>>> CAUTION: This email originated from outside of the University of 
> >>>>>>>> Guelph. Do not click links or open attachments unless you recognize 
> >>>>>>>> the sender and know the content is safe. If in doubt, forward 
> >>>>>>>> suspicious emails to ith...@uoguelph.ca.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Rick, have you been following this thread on freebsd-stable?  I have 
> >>>>>>>> been able
> >>>>>>>> to reproduce this using a 13-stable server from Oct 7 and a 
> >>>>>>>> 15-current system
> >>>>>>>> that is up to date using NFSv3.  I did not reproduce with a 13.2 
> >>>>>>>> server.  The
> >>>>>>>> client was running 13.2.  Any ideas?  A full bisect seems fairly 
> >>>>>>>> painful, but
> >>>>>>>> maybe you have an idea of points to try.  Fortunately, these are all 
> >>>>>>>> test
> >>>>>>>> systems that I can reboot at will.
> >>>>>>>>
> >>>>>>>>                 Mike
> >>>>>>>>
> >>>>>>>> Forwarded message:
> >>>>>>>>
> >>>>>>>>> From: Garrett Wollman <woll...@bimajority.org>
> >>>>>>>>> To: Mike Karels <m...@karels.net>
> >>>>>>>>> Cc: freebsd-sta...@freebsd.org
> >>>>>>>>> Subject: Re: NFS exports of ZFS snapshots broken
> >>>>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500
> >>>>>>>>>
> >>>>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> 
> >>>>>>>>> said:
> >>>>>>>>>
> >>>>>>>>>> I have not run into this, so I tried it just now.  I had no 
> >>>>>>>>>> problem.
> >>>>>>>>>> The server is 13.2, fully patched, the client is up-to-date 
> >>>>>>>>>> -current,
> >>>>>>>>>> and the mount is v4.
> >>>>>>>>>
> >>>>>>>>> On my 13.2 client and 13-stable server, I see:
> >>>>>>>>>
> >>>>>>>>>  25034 ls       CALL  
> >>>>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>)
> >>>>>>>>>  25034 ls       NAMI  "/mnt/tools/.zfs/snapshot/weekly-2023-45"
> >>>>>>>>>  25034 ls       RET   open 4
> >>>>>>>>>  25034 ls       CALL  fcntl(0x4,F_ISUNIONSTACK,0x0)
> >>>>>>>>>  25034 ls       RET   fcntl 0
> >>>>>>>>>  25034 ls       CALL  
> >>>>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028)
> >>>>>>>>>  25034 ls       RET   getdirentries -1 errno 5 Input/output error
> >>>>>>>>>  25034 ls       CALL  close(0x4)
> >>>>>>>>>  25034 ls       RET   close 0
> >>>>>>>>>  25034 ls       CALL  exit(0)
> >>>>>>>>>
> >>>>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO]
> >>>>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is
> >>>>>>>>> consistent across both FreeBSD and Linux clients.
> >>>>>>>>>
> >>>>>>>>> Looking at this from the RPC side:
> >>>>>>>>>
> >>>>>>>>>       (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR)
> >>>>>>>>>               [NFS4_OK for all ops]
> >>>>>>>>>       (PUTFH, GETATTR)
> >>>>>>>>>               [NFS4_OK, NFS4_OK]
> >>>>>>>>>       (PUTFH, ACCESS(0x3f), GETATTR)
> >>>>>>>>>               [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK]
> >>>>>>>>>       (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR)
> >>>>>>>>>               [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE]
> >>>>>>>>>
> >>>>>>>>> and at this point the [EIO] is returned.
> >>>>>>>>>
> >>>>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, 
> >>>>>>>>> and
> >>>>>>>>> this is failing when the subject file handle is the snapshot.  The
> >>>>>>>>> client is perfectly able to *traverse into* the snapshot: if I try 
> >>>>>>>>> to
> >>>>>>>>> list a subdirectory I know exists in the snapshot, the client is 
> >>>>>>>>> able to
> >>>>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with
> >>>>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*.
> >>>>>>>>>
> >>>>>>>>> -GAWollman
> >>>>>>>>

Attachment: zfssnap.patch
Description: Binary data

Reply via email to