On Sat, Nov 18, 2023 at 4:43 PM Mike Karels <m...@karels.net> wrote: > > On 18 Nov 2023, at 17:23, Rick Macklem wrote: > > > On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote: > >> > >> On 18 Nov 2023, at 15:58, Rick Macklem wrote: > >> > >>> On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> > >>> wrote: > >>>> > >>>> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote: > >>>>> > >>>>> On 17 Nov 2023, at 22:14, Mike Karels wrote: > >>>>> > >>>>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote: > >>>>>> > >>>>>>> Most of the changes in stable/13 that are not in releng/13.2 > >>>>>>> are the "make it work in a jail" stuff. Unfortunately, they are > >>>>>>> a large # of changes (mostly trivial edits adding vnet macros), > >>>>>>> but it also includes export check changes. > >>>>>>> > >>>>>>> I have attached a trivial patch that I think disables the export > >>>>>>> checks for jails. If either of you can try it and see if it fixes > >>>>>>> the problem, that would be great. > >>>>>>> (Note that this is only for testing, although it probably does not > >>>>>>> matter unless you are running nfsd(8) in vnet jails.) > >>>>>> > >>>>>> Yes, I can see snapshots with the patch. This system is just a test > >>>>>> system that doesn't normally run ZFS or NFS, so no problem messing > >>>>>> with permissions. It's a bhyve VM, so I just added a small disk and > >>>>>> enabled ZFS for testing. > >>>>> > >>>>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS > >>>>> side. It must be doing something differently inside a snapshot than > >>>>> outside, maybe with file handles or something like that. > >>>> Yes. I've added freebsd-current@ (although Garrett is not on it, he is > >>>> cc'd) and these guys specifically... > >>>> > >>>> So, here's what appears to be the problem... > >>>> Commit 88175af (in main and stable/13, but not 13.2) added checks for > >>>> nfsd(8) running in jails by filling in mnt_exjail with a reference to > >>>> the cred > >>>> used when the file system is exported. > >>>> When mnt_exjail is found NULL, the current nfsd code assumes that there > >>>> is no access allowed for the mount. > >>>> > >>>> My vague understanding is that when a ZFS snapshot is accessed, it is > >>>> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that > >>>> mnt_exjail is NULL as a result. > >>>> Since I do not know the ZFS code and don't even have an easy way to > >>>> test this (thankfully Mike can test easily), I do not know what to do > >>>> from > >>>> here? > >>>> > >>>> Is there a "struct mount" constructed for this pseudo mount > >>>> (or it actually appears to be the lookup of ".." that fails, so it > >>>> might be the parent of the snapshot subdir?)? > >>>> > >>>> One thought is that I can check to see if the mount pointer is in the > >>>> mountlist (I don't think the snapshot's mount is in the mountlist) and > >>>> avoid the jail test for this case. This would assume that snapshots are > >>>> always within the file system(s) exported via that jail (which includes > >>>> the case of prison0, of course), so that they do not need a separate > >>>> jail check. > >>>> > >>>> If this doesn't work, there will need to be some sort of messing about > >>>> in ZFS to set mnt_exjail for these. > >>> Ok, so now onto the hard part... > >>> Thanks to Mike and others, I did create a snapshot under .zfs and I can > >>> see the problem. It is that mnt_exjail == NULL. > >>> Now, is there a way that this "struct mount" can be recognized as > >>> "special" > >>> for snapshots, so I can avoid the mnt_exjail == NULL test? > >>> (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not > >>> the case.) > >> > >> Dumb question, is the mount point (mp presumably) different between the > >> snapshot and the main file system? > > Not a dump question and the answer is rather interesting... > > It is "sometimes" or "usually" according to my printf(). > > It seems that when you first "cd <snapshot-name"" you get a different mp > > where mnt_exjail == NULL.. Then when you look at directories within the > > snapshot, you get the mp of the file system that .zfs exists in, which does > > have mnt_exjail set non-NULL. > > > > There is this snippet of code in zfsctl_snapdir_lookup(): > > /* > > * Fix up the root vnode mounted on .zfs/snapshot/<snapname>. > > * > > * This is where we lie about our v_vfsp in order to > > * make .zfs/snapshot/<snapname> accessible over NFS > > * without requiring manual mounts of <snapname>. > > */ > > ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs); > > VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs; > > > > /* Clear the root flag (set via VFS_ROOT) as well. */ > > (*vpp)->v_vflag &= ~VV_ROOT; > > which seems to set the mp to that of the parent, but it > > seems this does not happen for the initial lookup of > > the <snapname>? > > > > I'll note that there is code before this in > > zfsctl_snapdir_lookup() for handling cases > > like "." and ".." that return without doing this. > > > > Now, why does this work without the mnt_exjail > > check (as in 13.2)? > > I am not quite sure, but there is this "cheat" in the > > NFS server (it has been there for years, maybe decades): > > /* > > * Allow a Lookup, Getattr, GetFH, Secinfo on an > > * non-exported directory if > > * nfs_rootfhset. Do I need to allow any other Ops? > > * (You can only have a non-exported vpnes if > > * nfs_rootfhset is true. See nfsd_fhtovp()) > > * Allow AUTH_SYS to be used for file systems > > * exported GSS only for certain Ops, to allow > > * clients to do mounts more easily. > > */ > > if (nfsv4_opflag[op].needscfh && vp) { > > if (!NFSVNO_EXPORTED(&vpnes) && > > op != NFSV4OP_LOOKUP && > > op != NFSV4OP_GETATTR && > > op != NFSV4OP_GETFH && > > op != NFSV4OP_ACCESS && > > op != NFSV4OP_READLINK && > > op != NFSV4OP_SECINFO && > > op != NFSV4OP_SECINFONONAME) > > nd->nd_repstat = NFSERR_NOFILEHANDLE; > > This allows certain operations to be done on > > non-exported file systems and I think that is enough > > to allow this to work when mnt_exjail is not checked. > > (Note that NFSV4OP_LOOKUPP is not in the list, > > which might explain why it is the one that fails for > > Garrett. I don't think it can be added to this list > > safely, since that would allow a client to move above > > the exported file system into "uncharted territory".) > > > >> Just curious. Also, what is mnt_exjail > >> normally set to for file systems not in a jail? > > mnt_exjail is set to the credentials of the thread/process > > that exported the file system (usually mountd(8)). > > When not in a jail, cr_prison for these credentials > > points to prison0. > > > > Btw, I checked and the "other mp that has mnt_exjail == NULL > > is in the mountlist, so the idea of checking "not in mountlist" > > is a dead end. > > > > I am looking for something "unique" about this other mp, > > but haven't found anything yet. > > Alternately, it might be necessary to add code to > > zfsctl_snapdir_lookup() to "cheat and change the mp" > > in more cases, such as "." and ".." lookups? > > It seems to me that if ZFS is creating an additional mount structure, > it should be responsible for setting it up correctly. That could > involve a vfs-level routine to do some of the cloning. In any case, > it seems to me that mnt_exjail should be set properly, e.g. by duping > the one in the original mount structure. Probably ZFS is the only > file system type that would need this added. I've created a patch that I think does this. It seemed to test ok for my case. It's in D42672 on reviews.freebsd.org. I have also attached it here (this diff doesn't use -U999999, so it might be easier to apply).
Hopefully this fixes the problem. Sorry for the breakage. rick > > Mike > > > rick > > ps: I added all the cc's back in because I want the > > ZFS folk to hopefully chime in. > > > >> > >> Mike > >> > >>> Do I need to search mountlist for it? > >>> > >>> rick > >>> ps: The hack patch attached should fix the problem, but can only be > >>> safely used if mountd/nfsd are not run in any jails. > >>> > >>>> > >>>> I will try and get a test setup going here, which leads me to.. > >>>> how do I create a ZFS snapshot? (I do have a simple ZFS pool running > >>>> on a test machine, but I've never done a snapshot.) > >>>> > >>>> Although this problem is not in 13.2, it will have shipped in 14.0. > >>>> > >>>> Any help with be appreciated, rick > >>>> > >>>>> > >>>>> Mike > >>>>>> > >>>>>>> rick > >>>>>>> > >>>>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote: > >>>>>>>> > >>>>>>>> CAUTION: This email originated from outside of the University of > >>>>>>>> Guelph. Do not click links or open attachments unless you recognize > >>>>>>>> the sender and know the content is safe. If in doubt, forward > >>>>>>>> suspicious emails to ith...@uoguelph.ca. > >>>>>>>> > >>>>>>>> > >>>>>>>> Rick, have you been following this thread on freebsd-stable? I have > >>>>>>>> been able > >>>>>>>> to reproduce this using a 13-stable server from Oct 7 and a > >>>>>>>> 15-current system > >>>>>>>> that is up to date using NFSv3. I did not reproduce with a 13.2 > >>>>>>>> server. The > >>>>>>>> client was running 13.2. Any ideas? A full bisect seems fairly > >>>>>>>> painful, but > >>>>>>>> maybe you have an idea of points to try. Fortunately, these are all > >>>>>>>> test > >>>>>>>> systems that I can reboot at will. > >>>>>>>> > >>>>>>>> Mike > >>>>>>>> > >>>>>>>> Forwarded message: > >>>>>>>> > >>>>>>>>> From: Garrett Wollman <woll...@bimajority.org> > >>>>>>>>> To: Mike Karels <m...@karels.net> > >>>>>>>>> Cc: freebsd-sta...@freebsd.org > >>>>>>>>> Subject: Re: NFS exports of ZFS snapshots broken > >>>>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500 > >>>>>>>>> > >>>>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> > >>>>>>>>> said: > >>>>>>>>> > >>>>>>>>>> I have not run into this, so I tried it just now. I had no > >>>>>>>>>> problem. > >>>>>>>>>> The server is 13.2, fully patched, the client is up-to-date > >>>>>>>>>> -current, > >>>>>>>>>> and the mount is v4. > >>>>>>>>> > >>>>>>>>> On my 13.2 client and 13-stable server, I see: > >>>>>>>>> > >>>>>>>>> 25034 ls CALL > >>>>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>) > >>>>>>>>> 25034 ls NAMI "/mnt/tools/.zfs/snapshot/weekly-2023-45" > >>>>>>>>> 25034 ls RET open 4 > >>>>>>>>> 25034 ls CALL fcntl(0x4,F_ISUNIONSTACK,0x0) > >>>>>>>>> 25034 ls RET fcntl 0 > >>>>>>>>> 25034 ls CALL > >>>>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028) > >>>>>>>>> 25034 ls RET getdirentries -1 errno 5 Input/output error > >>>>>>>>> 25034 ls CALL close(0x4) > >>>>>>>>> 25034 ls RET close 0 > >>>>>>>>> 25034 ls CALL exit(0) > >>>>>>>>> > >>>>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO] > >>>>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is > >>>>>>>>> consistent across both FreeBSD and Linux clients. > >>>>>>>>> > >>>>>>>>> Looking at this from the RPC side: > >>>>>>>>> > >>>>>>>>> (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR) > >>>>>>>>> [NFS4_OK for all ops] > >>>>>>>>> (PUTFH, GETATTR) > >>>>>>>>> [NFS4_OK, NFS4_OK] > >>>>>>>>> (PUTFH, ACCESS(0x3f), GETATTR) > >>>>>>>>> [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK] > >>>>>>>>> (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR) > >>>>>>>>> [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE] > >>>>>>>>> > >>>>>>>>> and at this point the [EIO] is returned. > >>>>>>>>> > >>>>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, > >>>>>>>>> and > >>>>>>>>> this is failing when the subject file handle is the snapshot. The > >>>>>>>>> client is perfectly able to *traverse into* the snapshot: if I try > >>>>>>>>> to > >>>>>>>>> list a subdirectory I know exists in the snapshot, the client is > >>>>>>>>> able to > >>>>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with > >>>>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*. > >>>>>>>>> > >>>>>>>>> -GAWollman > >>>>>>>>
zfssnap.patch
Description: Binary data