On Sat, Nov 18, 2023 at 2:27 PM Mike Karels <m...@karels.net> wrote: > > On 18 Nov 2023, at 15:58, Rick Macklem wrote: > > > On Sat, Nov 18, 2023 at 8:09 AM Rick Macklem <rick.mack...@gmail.com> wrote: > >> > >> On Fri, Nov 17, 2023 at 8:19 PM Mike Karels <m...@karels.net> wrote: > >>> > >>> On 17 Nov 2023, at 22:14, Mike Karels wrote: > >>> > >>>> On 17 Nov 2023, at 21:24, Rick Macklem wrote: > >>>> > >>>>> Most of the changes in stable/13 that are not in releng/13.2 > >>>>> are the "make it work in a jail" stuff. Unfortunately, they are > >>>>> a large # of changes (mostly trivial edits adding vnet macros), > >>>>> but it also includes export check changes. > >>>>> > >>>>> I have attached a trivial patch that I think disables the export > >>>>> checks for jails. If either of you can try it and see if it fixes > >>>>> the problem, that would be great. > >>>>> (Note that this is only for testing, although it probably does not > >>>>> matter unless you are running nfsd(8) in vnet jails.) > >>>> > >>>> Yes, I can see snapshots with the patch. This system is just a test > >>>> system that doesn't normally run ZFS or NFS, so no problem messing > >>>> with permissions. It's a bhyve VM, so I just added a small disk and > >>>> enabled ZFS for testing. > >>> > >>> btw, you might try to get mm@ or maybe mav@ to help out from the ZFS > >>> side. It must be doing something differently inside a snapshot than > >>> outside, maybe with file handles or something like that. > >> Yes. I've added freebsd-current@ (although Garrett is not on it, he is > >> cc'd) and these guys specifically... > >> > >> So, here's what appears to be the problem... > >> Commit 88175af (in main and stable/13, but not 13.2) added checks for > >> nfsd(8) running in jails by filling in mnt_exjail with a reference to the > >> cred > >> used when the file system is exported. > >> When mnt_exjail is found NULL, the current nfsd code assumes that there > >> is no access allowed for the mount. > >> > >> My vague understanding is that when a ZFS snapshot is accessed, it is > >> "pseudo-mounted" by zfsctl_snapdir_lookup() and I am guessing that > >> mnt_exjail is NULL as a result. > >> Since I do not know the ZFS code and don't even have an easy way to > >> test this (thankfully Mike can test easily), I do not know what to do from > >> here? > >> > >> Is there a "struct mount" constructed for this pseudo mount > >> (or it actually appears to be the lookup of ".." that fails, so it > >> might be the parent of the snapshot subdir?)? > >> > >> One thought is that I can check to see if the mount pointer is in the > >> mountlist (I don't think the snapshot's mount is in the mountlist) and > >> avoid the jail test for this case. This would assume that snapshots are > >> always within the file system(s) exported via that jail (which includes > >> the case of prison0, of course), so that they do not need a separate > >> jail check. > >> > >> If this doesn't work, there will need to be some sort of messing about > >> in ZFS to set mnt_exjail for these. > > Ok, so now onto the hard part... > > Thanks to Mike and others, I did create a snapshot under .zfs and I can > > see the problem. It is that mnt_exjail == NULL. > > Now, is there a way that this "struct mount" can be recognized as "special" > > for snapshots, so I can avoid the mnt_exjail == NULL test? > > (I had hoped that "mp->mnt_list.tqe_prev" would be NULL, but that is not > > the case.) > > Dumb question, is the mount point (mp presumably) different between the > snapshot and the main file system? Not a dump question and the answer is rather interesting... It is "sometimes" or "usually" according to my printf(). It seems that when you first "cd <snapshot-name"" you get a different mp where mnt_exjail == NULL.. Then when you look at directories within the snapshot, you get the mp of the file system that .zfs exists in, which does have mnt_exjail set non-NULL.
There is this snippet of code in zfsctl_snapdir_lookup(): /* * Fix up the root vnode mounted on .zfs/snapshot/<snapname>. * * This is where we lie about our v_vfsp in order to * make .zfs/snapshot/<snapname> accessible over NFS * without requiring manual mounts of <snapname>. */ ASSERT3P(VTOZ(*vpp)->z_zfsvfs, !=, zfsvfs); VTOZ(*vpp)->z_zfsvfs->z_parent = zfsvfs; /* Clear the root flag (set via VFS_ROOT) as well. */ (*vpp)->v_vflag &= ~VV_ROOT; which seems to set the mp to that of the parent, but it seems this does not happen for the initial lookup of the <snapname>? I'll note that there is code before this in zfsctl_snapdir_lookup() for handling cases like "." and ".." that return without doing this. Now, why does this work without the mnt_exjail check (as in 13.2)? I am not quite sure, but there is this "cheat" in the NFS server (it has been there for years, maybe decades): /* * Allow a Lookup, Getattr, GetFH, Secinfo on an * non-exported directory if * nfs_rootfhset. Do I need to allow any other Ops? * (You can only have a non-exported vpnes if * nfs_rootfhset is true. See nfsd_fhtovp()) * Allow AUTH_SYS to be used for file systems * exported GSS only for certain Ops, to allow * clients to do mounts more easily. */ if (nfsv4_opflag[op].needscfh && vp) { if (!NFSVNO_EXPORTED(&vpnes) && op != NFSV4OP_LOOKUP && op != NFSV4OP_GETATTR && op != NFSV4OP_GETFH && op != NFSV4OP_ACCESS && op != NFSV4OP_READLINK && op != NFSV4OP_SECINFO && op != NFSV4OP_SECINFONONAME) nd->nd_repstat = NFSERR_NOFILEHANDLE; This allows certain operations to be done on non-exported file systems and I think that is enough to allow this to work when mnt_exjail is not checked. (Note that NFSV4OP_LOOKUPP is not in the list, which might explain why it is the one that fails for Garrett. I don't think it can be added to this list safely, since that would allow a client to move above the exported file system into "uncharted territory".) > Just curious. Also, what is mnt_exjail > normally set to for file systems not in a jail? mnt_exjail is set to the credentials of the thread/process that exported the file system (usually mountd(8)). When not in a jail, cr_prison for these credentials points to prison0. Btw, I checked and the "other mp that has mnt_exjail == NULL is in the mountlist, so the idea of checking "not in mountlist" is a dead end. I am looking for something "unique" about this other mp, but haven't found anything yet. Alternately, it might be necessary to add code to zfsctl_snapdir_lookup() to "cheat and change the mp" in more cases, such as "." and ".." lookups? rick ps: I added all the cc's back in because I want the ZFS folk to hopefully chime in. > > Mike > > > Do I need to search mountlist for it? > > > > rick > > ps: The hack patch attached should fix the problem, but can only be > > safely used if mountd/nfsd are not run in any jails. > > > >> > >> I will try and get a test setup going here, which leads me to.. > >> how do I create a ZFS snapshot? (I do have a simple ZFS pool running > >> on a test machine, but I've never done a snapshot.) > >> > >> Although this problem is not in 13.2, it will have shipped in 14.0. > >> > >> Any help with be appreciated, rick > >> > >>> > >>> Mike > >>>> > >>>>> rick > >>>>> > >>>>> On Fri, Nov 17, 2023 at 6:14 PM Mike Karels <m...@karels.net> wrote: > >>>>>> > >>>>>> CAUTION: This email originated from outside of the University of > >>>>>> Guelph. Do not click links or open attachments unless you recognize > >>>>>> the sender and know the content is safe. If in doubt, forward > >>>>>> suspicious emails to ith...@uoguelph.ca. > >>>>>> > >>>>>> > >>>>>> Rick, have you been following this thread on freebsd-stable? I have > >>>>>> been able > >>>>>> to reproduce this using a 13-stable server from Oct 7 and a 15-current > >>>>>> system > >>>>>> that is up to date using NFSv3. I did not reproduce with a 13.2 > >>>>>> server. The > >>>>>> client was running 13.2. Any ideas? A full bisect seems fairly > >>>>>> painful, but > >>>>>> maybe you have an idea of points to try. Fortunately, these are all > >>>>>> test > >>>>>> systems that I can reboot at will. > >>>>>> > >>>>>> Mike > >>>>>> > >>>>>> Forwarded message: > >>>>>> > >>>>>>> From: Garrett Wollman <woll...@bimajority.org> > >>>>>>> To: Mike Karels <m...@karels.net> > >>>>>>> Cc: freebsd-sta...@freebsd.org > >>>>>>> Subject: Re: NFS exports of ZFS snapshots broken > >>>>>>> Date: Fri, 17 Nov 2023 17:35:04 -0500 > >>>>>>> > >>>>>>> <<On Fri, 17 Nov 2023 15:57:42 -0600, Mike Karels <m...@karels.net> > >>>>>>> said: > >>>>>>> > >>>>>>>> I have not run into this, so I tried it just now. I had no problem. > >>>>>>>> The server is 13.2, fully patched, the client is up-to-date -current, > >>>>>>>> and the mount is v4. > >>>>>>> > >>>>>>> On my 13.2 client and 13-stable server, I see: > >>>>>>> > >>>>>>> 25034 ls CALL > >>>>>>> open(0x237d32f9a000,0x120004<O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC>) > >>>>>>> 25034 ls NAMI "/mnt/tools/.zfs/snapshot/weekly-2023-45" > >>>>>>> 25034 ls RET open 4 > >>>>>>> 25034 ls CALL fcntl(0x4,F_ISUNIONSTACK,0x0) > >>>>>>> 25034 ls RET fcntl 0 > >>>>>>> 25034 ls CALL > >>>>>>> getdirentries(0x4,0x237d32faa000,0x1000,0x237d32fa7028) > >>>>>>> 25034 ls RET getdirentries -1 errno 5 Input/output error > >>>>>>> 25034 ls CALL close(0x4) > >>>>>>> 25034 ls RET close 0 > >>>>>>> 25034 ls CALL exit(0) > >>>>>>> > >>>>>>> Certainly a libc bug here that getdirentries(2) returning [EIO] > >>>>>>> results in ls(1) returning EXIT_SUCCESS, but the [EIO] error is > >>>>>>> consistent across both FreeBSD and Linux clients. > >>>>>>> > >>>>>>> Looking at this from the RPC side: > >>>>>>> > >>>>>>> (PUTFH, GETATTR, LOOKUP(snapshotname), GETFH, GETATTR) > >>>>>>> [NFS4_OK for all ops] > >>>>>>> (PUTFH, GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK] > >>>>>>> (PUTFH, ACCESS(0x3f), GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK, rights = 0x03, NFS4_OK] > >>>>>>> (PUTFH, GETATTR, LOOKUPP, GETFH, GETATTR) > >>>>>>> [NFS4_OK, NFS4_OK, NFS4ERR_NOFILEHANDLE] > >>>>>>> > >>>>>>> and at this point the [EIO] is returned. > >>>>>>> > >>>>>>> It seems that clients always do a LOOKUPP before calling READDIR, and > >>>>>>> this is failing when the subject file handle is the snapshot. The > >>>>>>> client is perfectly able to *traverse into* the snapshot: if I try to > >>>>>>> list a subdirectory I know exists in the snapshot, the client is able > >>>>>>> to > >>>>>>> LOOKUP(dirname) just fine, but LOOKUPP still fails with > >>>>>>> NFS4ERR_NOFILEHANDLE *on the subndirectory*. > >>>>>>> > >>>>>>> -GAWollman > >>>>>>