Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2021-11-01 Thread Rick Macklem
Miroslav Lachman wrote:
[good stuff snipped]
> Apple sources can be found there
> https://opensource.apple.com/source/smb/ with all the history from SMBv1
> to SMBv3. The files have original copyright header from 2001 Boris Popov
> (same as FreeBSD) but otherwise it is very different code due to
> different kernel interfaces and so on.
> With Apple and Illumos sources it is possible to have smbfs in FreeBSD
> upgraded to v2 or v3 but very skilled programmer is needed for this
> work. And for the past years there is none interested in this work.

Although I agree that it would be a non-trivial exercise, a lot of the Apple
differences are in the "smoke and mirrors" category.
Around OSX 10.4, they changed their VFS/VOP to typedefs and accessor
functions. For example:
   "struct vnode *vp" became "vnode_t vp"
and "vp->v_type" became "vnode_type(vp)"

Ten years ago, the actual semantics were very close to what FreeBSD used.
If you look at sys/fs/nfs/nfskpiport.h in older sources (around FreeBSD 10),
you'll see a bunch of macros I used to allow the Apple port to also build/run
on FreeBSD (a couple, such as vnode_t are still left because I've never gotten
around to doing the edit to replace them).

The hard part will be dealing with the actual VFS/VOP semantics changes that
have occurred in the last 10 years.

Did they stick APSLs on the files? (If so, I think it could still be ok, since 
the APSL
is a lot like the CDDL. However, I'm not sure if the APSL has ever been blessed
by FreeBSD as of yet?)

Don't assume anything will happen, but I *might* take a look in the winter,
since outstanding NFS changes should be done by the end of 2021.

It does sound like there is some interest in this and that fuse doesn't solve
the problem (at least for everyone).

rick

Miroslav Lachman




Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2022-01-09 Thread Rick Macklem
Well, I took a look at the Apple code and I'm afraid I
think porting it into FreeBSD is too big a job for me.

I was hoping the code would have a layer that could
be used as a "block box" for the VOP calls, but that
does not seem to be the case.
There is also a *lot* of code in it.

I am going to look at the OpenSolaris code, to see if
I think it will be an easier port.

rick


From: Miroslav Lachman <000.f...@quip.cz>
Sent: Monday, November 1, 2021 5:47 PM
To: Rick Macklem; freebsd-curr...@freebsd.org; freebsd-stable
Cc: Yuri
Subject: Re: Deprecating smbfs(5) and removing it before FreeBSD 14

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca


On 01/11/2021 16:55, Rick Macklem wrote:
> Miroslav Lachman wrote:
> [good stuff snipped]
>> Apple sources can be found there
>> https://opensource.apple.com/source/smb/ with all the history from SMBv1
>> to SMBv3. The files have original copyright header from 2001 Boris Popov
>> (same as FreeBSD) but otherwise it is very different code due to
>> different kernel interfaces and so on.
>> With Apple and Illumos sources it is possible to have smbfs in FreeBSD
>> upgraded to v2 or v3 but very skilled programmer is needed for this
>> work. And for the past years there is none interested in this work.
>
> Although I agree that it would be a non-trivial exercise, a lot of the Apple
> differences are in the "smoke and mirrors" category.
> Around OSX 10.4, they changed their VFS/VOP to typedefs and accessor
> functions. For example:
> "struct vnode *vp" became "vnode_t vp"
> and "vp->v_type" became "vnode_type(vp)"
>
> Ten years ago, the actual semantics were very close to what FreeBSD used.
> If you look at sys/fs/nfs/nfskpiport.h in older sources (around FreeBSD 10),
> you'll see a bunch of macros I used to allow the Apple port to also build/run
> on FreeBSD (a couple, such as vnode_t are still left because I've never gotten
> around to doing the edit to replace them).

If I see it right even the 10 years old Apple version of smbfs has
support for SMBv2 so if this old version is closer to FreeBSD kernel /
smbfs it can be a good starting point to merge changes to our smbfs to
have SMBv2 support on FreeBSD.

> The hard part will be dealing with the actual VFS/VOP semantics changes that
> have occurred in the last 10 years.
>
> Did they stick APSLs on the files? (If so, I think it could still be ok, 
> since the APSL
> is a lot like the CDDL. However, I'm not sure if the APSL has ever been 
> blessed
> by FreeBSD as of yet?)

The old versions of smbfs has original copyright header and no other
license. Newer version has some added files with different header with
APSL license. For example
https://opensource.apple.com/source/smb/smb-759.40.1/kernel/smbfs/smbfs_subr_2.h.auto.html

If license is a problem then I think it can live with APSL in the ports
tree as a loadable kernel module. Maybe this will be the easier for
development too?

> Don't assume anything will happen, but I *might* take a look in the winter,
> since outstanding NFS changes should be done by the end of 2021.

I really appreciate your endless work on NFS on FreeBSD. Without your
work the NFS will be lacking behind industry standards similar to what
we see with smbfs.
And if you will have some spare time to take a look on smbfs and maybe
solve the SMBv2 / SMBv3 problem you will be my hero. I am waiting for it
for many years and I know I am not alone who needs working SMB / CIFS on
FreeBSD.

> It does sound like there is some interest in this and that fuse doesn't solve
> the problem (at least for everyone).

Yes, there is an interest. It was discussed few times in the past in the
mailing lists and web forums.freebsd.org but without anybody willing to
touch the code.
FUSE alternatives have so many problems with performance, stability and
configuration.
https://forums.freebsd.org/threads/getting-smbnetfs-to-work.78413/

Kind regards
Miroslav Lachman



Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2022-01-19 Thread Rick Macklem
I have downloaded the final version of the opensolaris
smbfs and it looks much more reasonable to port to
FreeBSD.

I will be starting to work on this (and maybe Mark Saad will be
able to help).

I have no idea when I'll have code that can be tested by others.

rick


From: Miroslav Lachman <000.f...@quip.cz>
Sent: Monday, January 10, 2022 10:27 AM
To: Rick Macklem; freebsd-curr...@freebsd.org; freebsd-stable
Cc: Yuri
Subject: Re: Deprecating smbfs(5) and removing it before FreeBSD 14

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca


Hello Rick,
thank you for the update and your time on smbfs. I hope OpenSolaris
version will be portable. (or mayby some older version from Apple?)
FreeBSD without possibility to mount smbfs is not an option for some
projects.

Kind regards
Miroslav Lachman


On 09/01/2022 15:46, Rick Macklem wrote:
> Well, I took a look at the Apple code and I'm afraid I
> think porting it into FreeBSD is too big a job for me.
>
> I was hoping the code would have a layer that could
> be used as a "block box" for the VOP calls, but that
> does not seem to be the case.
> There is also a *lot* of code in it.
>
> I am going to look at the OpenSolaris code, to see if
> I think it will be an easier port.
>
> rick
>
> 
> From: Miroslav Lachman <000.f...@quip.cz>
> Sent: Monday, November 1, 2021 5:47 PM
> To: Rick Macklem; freebsd-curr...@freebsd.org; freebsd-stable
> Cc: Yuri
> Subject: Re: Deprecating smbfs(5) and removing it before FreeBSD 14
>
> CAUTION: This email originated from outside of the University of Guelph. Do 
> not click links or open attachments unless you recognize the sender and know 
> the content is safe. If in doubt, forward suspicious emails to 
> ith...@uoguelph.ca
>
>
> On 01/11/2021 16:55, Rick Macklem wrote:
>> Miroslav Lachman wrote:
>> [good stuff snipped]
>>> Apple sources can be found there
>>> https://opensource.apple.com/source/smb/ with all the history from SMBv1
>>> to SMBv3. The files have original copyright header from 2001 Boris Popov
>>> (same as FreeBSD) but otherwise it is very different code due to
>>> different kernel interfaces and so on.
>>> With Apple and Illumos sources it is possible to have smbfs in FreeBSD
>>> upgraded to v2 or v3 but very skilled programmer is needed for this
>>> work. And for the past years there is none interested in this work.
>>
>> Although I agree that it would be a non-trivial exercise, a lot of the Apple
>> differences are in the "smoke and mirrors" category.
>> Around OSX 10.4, they changed their VFS/VOP to typedefs and accessor
>> functions. For example:
>>  "struct vnode *vp" became "vnode_t vp"
>> and "vp->v_type" became "vnode_type(vp)"
>>
>> Ten years ago, the actual semantics were very close to what FreeBSD used.
>> If you look at sys/fs/nfs/nfskpiport.h in older sources (around FreeBSD 10),
>> you'll see a bunch of macros I used to allow the Apple port to also build/run
>> on FreeBSD (a couple, such as vnode_t are still left because I've never 
>> gotten
>> around to doing the edit to replace them).
>
> If I see it right even the 10 years old Apple version of smbfs has
> support for SMBv2 so if this old version is closer to FreeBSD kernel /
> smbfs it can be a good starting point to merge changes to our smbfs to
> have SMBv2 support on FreeBSD.
>
>> The hard part will be dealing with the actual VFS/VOP semantics changes that
>> have occurred in the last 10 years.
>>
>> Did they stick APSLs on the files? (If so, I think it could still be ok, 
>> since the APSL
>> is a lot like the CDDL. However, I'm not sure if the APSL has ever been 
>> blessed
>> by FreeBSD as of yet?)
>
> The old versions of smbfs has original copyright header and no other
> license. Newer version has some added files with different header with
> APSL license. For example
> https://opensource.apple.com/source/smb/smb-759.40.1/kernel/smbfs/smbfs_subr_2.h.auto.html
>
> If license is a problem then I think it can live with APSL in the ports
> tree as a loadable kernel module. Maybe this will be the easier for
> development too?
>
>> Don't assume anything will happen, but I *might* take a look in the winter,
>> since outstanding NFS changes should be done by the end of 2021.
>
> I really appreciate your endless work on 

Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2022-01-19 Thread Rick Macklem
Yuri  wrote:
> Rick Macklem wrote:
> > I have downloaded the final version of the opensolaris
> > smbfs and it looks much more reasonable to port to
> > FreeBSD.
>
> What do you mean by "final version of the opensolaris smbfs", the one
> from 2010?  Please note that illumos (actively maintained fork of
> opensolaris) has a much more up to date one.
I'm not surprised that illumos will have updates.
I don't think it will affect the exercise at this time, since the current work
is to figure out what pieces of the opensolaris code needs to be pulled
into the current smbfs to make the newer version work.
Solaris uses a very different VFS/VOP locking model, so a direct port
of the opensolaris code would be more work than I will be attempting.

rick

> I will be starting to work on this (and maybe Mark Saad will be
> able to help).
>
> I have no idea when I'll have code that can be tested by others.
>
> rick
>
> 
> From: Miroslav Lachman <000.f...@quip.cz>
> Sent: Monday, January 10, 2022 10:27 AM
> To: Rick Macklem; freebsd-curr...@freebsd.org; freebsd-stable
> Cc: Yuri
> Subject: Re: Deprecating smbfs(5) and removing it before FreeBSD 14
>
> CAUTION: This email originated from outside of the University of Guelph. Do 
> not click links or open attachments unless you recognize the sender and know 
> the content is safe. If in doubt, forward suspicious emails to 
> ith...@uoguelph.ca
>
>
> Hello Rick,
> thank you for the update and your time on smbfs. I hope OpenSolaris
> version will be portable. (or mayby some older version from Apple?)
> FreeBSD without possibility to mount smbfs is not an option for some
> projects.
>
> Kind regards
> Miroslav Lachman
>
>
> On 09/01/2022 15:46, Rick Macklem wrote:
>> Well, I took a look at the Apple code and I'm afraid I
>> think porting it into FreeBSD is too big a job for me.
>>
>> I was hoping the code would have a layer that could
>> be used as a "block box" for the VOP calls, but that
>> does not seem to be the case.
>> There is also a *lot* of code in it.
>>
>> I am going to look at the OpenSolaris code, to see if
>> I think it will be an easier port.
>>
>> rick
>>
>> 
>> From: Miroslav Lachman <000.f...@quip.cz>
>> Sent: Monday, November 1, 2021 5:47 PM
>> To: Rick Macklem; freebsd-curr...@freebsd.org; freebsd-stable
>> Cc: Yuri
>> Subject: Re: Deprecating smbfs(5) and removing it before FreeBSD 14
>>
>> CAUTION: This email originated from outside of the University of Guelph. Do 
>> not click links or open attachments unless you recognize the sender and know 
>> the content is safe. If in doubt, forward suspicious emails to 
>> ith...@uoguelph.ca
>>
>>
>> On 01/11/2021 16:55, Rick Macklem wrote:
>>> Miroslav Lachman wrote:
>>> [good stuff snipped]
>>>> Apple sources can be found there
>>>> https://opensource.apple.com/source/smb/ with all the history from SMBv1
>>>> to SMBv3. The files have original copyright header from 2001 Boris Popov
>>>> (same as FreeBSD) but otherwise it is very different code due to
>>>> different kernel interfaces and so on.
>>>> With Apple and Illumos sources it is possible to have smbfs in FreeBSD
>>>> upgraded to v2 or v3 but very skilled programmer is needed for this
>>>> work. And for the past years there is none interested in this work.
>>>
>>> Although I agree that it would be a non-trivial exercise, a lot of the Apple
>>> differences are in the "smoke and mirrors" category.
>>> Around OSX 10.4, they changed their VFS/VOP to typedefs and accessor
>>> functions. For example:
>>>  "struct vnode *vp" became "vnode_t vp"
>>> and "vp->v_type" became "vnode_type(vp)"
>>>
>>> Ten years ago, the actual semantics were very close to what FreeBSD used.
>>> If you look at sys/fs/nfs/nfskpiport.h in older sources (around FreeBSD 10),
>>> you'll see a bunch of macros I used to allow the Apple port to also 
>>> build/run
>>> on FreeBSD (a couple, such as vnode_t are still left because I've never 
>>> gotten
>>> around to doing the edit to replace them).
>>
>> If I see it right even the 10 years old Apple version of smbfs has
>> support for SMBv2 so if this old version is closer to FreeBSD kernel /
>> smbfs it can be a good starting point to merge changes to our smbfs to
>> have SMBv2 support on

Re: Deprecating smbfs(5) and removing it before FreeBSD 14

2022-01-22 Thread Rick Macklem
Mark Saad  wrote:
[stuff snipped]
> So I am looking at the Apple and Solaris code, provided by rick. I am not
> sure if the illumos code provides SMB2 support. They based the solaris
> code on Apple SMB-217.x which is from OSX 10.4 . Which I am sure
> predates smb2 .
>
> https://github.com/apple-oss-distributions/smb/tree/smb-217.19
>
> If I am following this correctly we need to look at Apple's smb client
> from OSX 10.9  which is where I start to see bits about smb2
>
> https://github.com/apple-oss-distributions/smb/tree/smb-697.95.1/kernel/netsmb
>
> This is also where this stuff starts to look less and less like FreeBSD .
> Let me ask some of the illumos people I know to see if there is
> anything they can point to.
Yes. Please do so. I saw the "old" calls fo things like open and the
new ntcreate version, so I assumed that was the newer SMB.
If it is not, there is no reason to port it.

The new Apple code is a monster. 10x the lines of C and a lot of
weird stuff that looks Apple specific.

It might actually be easier to write SMBv2 from the spec than port
the Apple stuff.
--> I'll try and look at whatever Microsoft publishes w.r.t. SMBv2/3.

Thanks for looking at this, rick



--
mark saad | nones...@longcount.org



Re: nfsd becomes slow when machine CPU usage is at or over 100% on STABLE/13

2022-03-09 Thread Rick Macklem
Yoshihiro Ota  wrote:
> Hi,
>
> I'm on stable/13 with latest code base.
> I started testing pre-13.1 branch.
>
> I noticed major performance degrades with NFS when all CPUs are fully 
> utilized.
>
> This happends with stable/13 but not releng/13.0 nor releng/12.3.
NFS performance is sensitive to RPC response time.
Since this only happens when the COUs are busy, I'd suspect:
- Kernel thread scheduling changes
or
- Timing of receive socket upcalls (which wake up the nfsd kernel threads).

I suspect bisecting to the actual commit that causes this is the only way
to find it.
If you know of a working stable/13 that is more recent than 13.0, it would
help. If not, you start at this commit (which did make socket upcall changes):
commit 55cc0a478506ee1c2db7b2f9aadb9855e5490af3
which was done on May 21, 2021.

Maybe others can suggest commits related to thread scheduling (which I
know nothing about).

If you don't have the time/resources to bisect, I doubt this will get resolved.

Good luck with it, rick

I had NFS server with above versions and rsynced nfs mount to ufs mount on NFS 
clients.
My NFS server has 4 cores.
When I had load average of 3 with make buildworld -j3, NFS server was fine.
After adding another 1 load, NFS server throughput came down to about 10% of 
before.
After taking back to 3 load avg, performance recovered and down again after 
getting over 4.
Disk was fully avaiable for rsync; buildworld was done on another disk.


Someone told me his smbfs was also slow and he suspected TCP/IP regression 
instead of NFS, by the way.

Hiro





Re: nfsd becomes slow when machine CPU usage is at or over 100% on STABLE/13

2022-03-20 Thread Rick Macklem
mike tancsa  wrote:
> On 3/20/2022 7:43 AM, mike tancsa wrote:
>> On 3/18/2022 9:18 PM, Yoshihiro Ota wrote:
>>> I had built several versions between releng/13.0 branch point to
>>> stable/13 (before releng/13.1 was created) and all of them had such
>> performance degrade.
>>>
>>> I started suspecting stable debug options and thus built releng/13.1
>>> and tested.
>>> I don't see NFS slowdown unlike stable/13.
>>> releng/13.0 and releng/12.2 were also fine.
>>
>> Hi,
>>
>> I would think there is very little difference (if any) between
>> releng/13.1 and stable/13 right now.  Are you sure stable/13 suffers
>> from this issue you are seeing ?
The sources may be almost the same, but the build is not.
See /usr/src/sys/conf/std.nodebug.

I'm assuming his releng/13.1 build created a non-debug kernel.
Debug kernels do spit out "expect reduced performance" if i recall
correctly. It sounds like he found an example of this.

rick

>
>
These look to be the only files touched below.

0{cage}% git diff remotes/origin/releng/13.1..remotes/origin/stable/13 |
grep '^\-\-'
--- a/contrib/tzcode/stdtime/ctime.3
--- a/lib/libc/gen/time.3
--- a/lib/libcasper/services/cap_net/cap_net.c
--- a/lib/libpfctl/libpfctl.c
--- a/lib/libpfctl/libpfctl.h
--- a/libexec/rc/rc.d/dumpon
--- a/release/pkg_repos/release-dvd.conf
--- a/sbin/devd/devd.conf
--- a/sbin/ipf/common/ipf.h
--- a/sbin/ipf/libipf/printactivenat.c
--- a/sbin/ipf/libipf/printstate.c
--- a/sbin/pfctl/pfctl.c
--- a/sbin/pfctl/pfctl_optimize.c
--- a/share/man/man4/Makefile
--- a/share/man/man4/netmap.4
--- /dev/null
--- a/share/man/man4/vale.4
--- a/share/man/man9/crypto_buffer.9
--- a/stand/efi/libefi/efi_console.c
--- a/stand/i386/libi386/vidconsole.c
--- a/sys/arm64/include/pcpu.h
--- a/sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
--- a/sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
--- a/sys/conf/newvers.sh
--- a/sys/crypto/armv8/armv8_crypto.c
--- a/sys/crypto/armv8/armv8_crypto.h
--- a/sys/crypto/armv8/armv8_crypto_wrap.c
--- a/sys/dev/netmap/netmap.c
--- a/sys/dev/netmap/netmap_bdg.c
--- a/sys/dev/netmap/netmap_kern.h
--- a/sys/dev/netmap/netmap_vale.c
--- a/sys/i386/i386/machdep.c
--- a/sys/kern/kern_rmlock.c
--- a/sys/kern/sys_process.c
--- a/sys/kern/vfs_cache.c
--- a/sys/kern/vfs_subr.c
--- a/sys/modules/if_epair/Makefile
--- a/sys/modules/linuxkpi/Makefile
--- a/sys/net/if_epair.c
--- a/sys/opencrypto/cryptodev.h
--- a/sys/riscv/include/cpufunc.h
--- a/sys/riscv/include/pmap.h
--- a/sys/riscv/include/pte.h
--- a/sys/riscv/include/riscvreg.h
--- a/sys/riscv/include/vmparam.h
--- a/sys/riscv/riscv/elf_machdep.c
--- a/sys/riscv/riscv/locore.S
--- a/sys/riscv/riscv/pmap.c
--- a/sys/sys/param.h
--- a/sys/x86/x86/mp_x86.c
--- a/usr.bin/diff/pr.c
--- a/usr.bin/touch/touch.c
0{cage}%






Re: nfsd becomes slow when machine CPU usage is at or over 100% on STABLE/13

2022-03-27 Thread Rick Macklem
Yoshihiro Ota  wrote:
> I've been building default kernel, that is GENERIC for releng and stable
> branches.
>
> I see GENERIC-NODEBUG on head but don't seem to find it on releng/
> nor stable/
> I grepped /usr/src/* for std.nodebug but don't seem have a match...
> I wonder how nodebug kicks in releng.
I do not know what magic is used so that the releng kernels build
without debugging.

However, if you see this line:
WARNING: WITNESS option enabled, expect reduced performance.
in the first screen when booting, you are running a kernel with debugging
built into it.

rick

I added "enable DDB" to i386 kernel but didn't touch amd64 kernel.
Both amd64 and i386 looks okay with releng/13.1.

Hiro

On Sun, 20 Mar 2022 20:45:30 +
Rick Macklem  wrote:

> mike tancsa  wrote:
> > On 3/20/2022 7:43 AM, mike tancsa wrote:
> >> On 3/18/2022 9:18 PM, Yoshihiro Ota wrote:
> >>> I had built several versions between releng/13.0 branch point to
> >>> stable/13 (before releng/13.1 was created) and all of them had such
> >> performance degrade.
> >>>
> >>> I started suspecting stable debug options and thus built releng/13.1
> >>> and tested.
> >>> I don't see NFS slowdown unlike stable/13.
> >>> releng/13.0 and releng/12.2 were also fine.
> >>
> >> Hi,
> >>
> >> I would think there is very little difference (if any) between
> >> releng/13.1 and stable/13 right now.  Are you sure stable/13 suffers
> >> from this issue you are seeing ?
> The sources may be almost the same, but the build is not.
> See /usr/src/sys/conf/std.nodebug.
>
> I'm assuming his releng/13.1 build created a non-debug kernel.
> Debug kernels do spit out "expect reduced performance" if i recall
> correctly. It sounds like he found an example of this.
>
> rick
>
> >
> >
> These look to be the only files touched below.
>
> 0{cage}% git diff remotes/origin/releng/13.1..remotes/origin/stable/13 |
> grep '^\-\-'
> --- a/contrib/tzcode/stdtime/ctime.3
> --- a/lib/libc/gen/time.3
> --- a/lib/libcasper/services/cap_net/cap_net.c
> --- a/lib/libpfctl/libpfctl.c
> --- a/lib/libpfctl/libpfctl.h
> --- a/libexec/rc/rc.d/dumpon
> --- a/release/pkg_repos/release-dvd.conf
> --- a/sbin/devd/devd.conf
> --- a/sbin/ipf/common/ipf.h
> --- a/sbin/ipf/libipf/printactivenat.c
> --- a/sbin/ipf/libipf/printstate.c
> --- a/sbin/pfctl/pfctl.c
> --- a/sbin/pfctl/pfctl_optimize.c
> --- a/share/man/man4/Makefile
> --- a/share/man/man4/netmap.4
> --- /dev/null
> --- a/share/man/man4/vale.4
> --- a/share/man/man9/crypto_buffer.9
> --- a/stand/efi/libefi/efi_console.c
> --- a/stand/i386/libi386/vidconsole.c
> --- a/sys/arm64/include/pcpu.h
> --- a/sys/cddl/contrib/opensolaris/uts/common/dtrace/fasttrap.c
> --- a/sys/cddl/contrib/opensolaris/uts/intel/dtrace/fasttrap_isa.c
> --- a/sys/conf/newvers.sh
> --- a/sys/crypto/armv8/armv8_crypto.c
> --- a/sys/crypto/armv8/armv8_crypto.h
> --- a/sys/crypto/armv8/armv8_crypto_wrap.c
> --- a/sys/dev/netmap/netmap.c
> --- a/sys/dev/netmap/netmap_bdg.c
> --- a/sys/dev/netmap/netmap_kern.h
> --- a/sys/dev/netmap/netmap_vale.c
> --- a/sys/i386/i386/machdep.c
> --- a/sys/kern/kern_rmlock.c
> --- a/sys/kern/sys_process.c
> --- a/sys/kern/vfs_cache.c
> --- a/sys/kern/vfs_subr.c
> --- a/sys/modules/if_epair/Makefile
> --- a/sys/modules/linuxkpi/Makefile
> --- a/sys/net/if_epair.c
> --- a/sys/opencrypto/cryptodev.h
> --- a/sys/riscv/include/cpufunc.h
> --- a/sys/riscv/include/pmap.h
> --- a/sys/riscv/include/pte.h
> --- a/sys/riscv/include/riscvreg.h
> --- a/sys/riscv/include/vmparam.h
> --- a/sys/riscv/riscv/elf_machdep.c
> --- a/sys/riscv/riscv/locore.S
> --- a/sys/riscv/riscv/pmap.c
> --- a/sys/sys/param.h
> --- a/sys/x86/x86/mp_x86.c
> --- a/usr.bin/diff/pr.c
> --- a/usr.bin/touch/touch.c
> 0{cage}%
>
>
>




Re: nfs client's OpenOwner count increases without bounds

2022-05-04 Thread Rick Macklem
Alan Somers  wrote:
> I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop
> mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It
> worked fine until a few weeks ago.  Now, the desktop's performance
> slowly degrades.  It becomes less and less responsive until I restart
> X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries like
> "AT keyboard: client bug: event processing lagging behind by 112ms,
> your system is too slow".  "top -S" shows that the busiest process is
> nfscl.  A dtrace profile shows that nfscl is spending most of its time
> in nfscl_cleanup_common, in the loop over all nfsclowner objects.
> Running "nfsdumpstate" on the server shows thousands of OpenOwners for
> that client, and < 10 for any other NFS client.  The OpenOwners
> increases by about 3000 per day.  And yet, "fstat" shows only a couple
> hundred open files on the NFS file system.  Why are OpenOwners so
> high?  Killing most of my desktop processes doesn't seem to make a
> difference.  Restarting X does improve the perceived responsiveness,
> though it does not change the number of OpenOwners.
>
> How can I figure out which process(es) are responsible for the
> excessive OpenOwners?  
An OpenOwner represents a process on the client. The OpenOwner
name is an encoding of pid + process startup time.
However, I can't think of an easy way to get at the OpenOwner name.

Now, why aren't they going away, hmm..

I'm assuming the # of Opens is not large?
(Openowners cannot go away until all associated opens
 are closed.)

Commit 1cedb4ea1a79 in main changed the semantics of this
a little, to avoid a use-after-free bug. However, it is dated
Feb. 25, 2022 and is not in 13.0, so I don't think it could
be the culprit.

Essentially, the function called nfscl_cleanupkext() should call
nfscl_procdoesntexist(), which returns true after the process has
exited and when that is the case, calls nfscl_cleanup_common().
--> nfscl_cleanup_common() will either get rid of the openowner or,
  if there are still children with open file descriptors, mark it "defunct"
  so it can be free'd once the children close the file.

It could be that X is now somehow creating a long chain of processes
where the children inherit a file descriptor and that delays the cleanup
indefinitely?
Even then, everything should get cleaned up once you kill off X?
(It might take a couple of seconds after killing all the processes off.)

Another possibility is that the "nfscl" thread is wedged somehow.
It is the one that will call nfscl_cleanupkext() once/sec. If it never
gets called, the openowners will never go away.

Being old fashioned, I'd probably try to figure this out by adding
some printf()s to nfscl_cleanupkext() and nfscl_cleanup_common().

To avoid the problem, you can probably just use the "oneopenown"
mount option. With that option, only one openowner is used for
all opens. (Having separate openowners for each process was needed
for NFSv4.0, but not NFSv4.1/4.2.)

> Or is it just a red herring and I shouldn't
> worry?
Well, you can probably avoid the problem by using the "oneopenown"
mount option.

Thanks for reporting this, rick
ps: And, yes, large numbers of openowners will slow things down,
  since the code ends up doing linear scans of them all in a linked
  list in various places.

-Alan




Re: nfs client's OpenOwner count increases without bounds

2022-05-04 Thread Rick Macklem
Alan Somers  wrote:
> On Wed, May 4, 2022 at 5:23 PM Rick Macklem  wrote:
> >
> > Alan Somers  wrote:
> > > I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop
> > > mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It
> > > worked fine until a few weeks ago.  Now, the desktop's performance
> > > slowly degrades.  It becomes less and less responsive until I restart
> > > X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries like
> > > "AT keyboard: client bug: event processing lagging behind by 112ms,
> > > your system is too slow".  "top -S" shows that the busiest process is
> > > nfscl.  A dtrace profile shows that nfscl is spending most of its time
> > > in nfscl_cleanup_common, in the loop over all nfsclowner objects.
> > > Running "nfsdumpstate" on the server shows thousands of OpenOwners for
> > > that client, and < 10 for any other NFS client.  The OpenOwners
> > > increases by about 3000 per day.  And yet, "fstat" shows only a couple
> > > hundred open files on the NFS file system.  Why are OpenOwners so
> > > high?  Killing most of my desktop processes doesn't seem to make a
> > > difference.  Restarting X does improve the perceived responsiveness,
> > > though it does not change the number of OpenOwners.
> > >
> > > How can I figure out which process(es) are responsible for the
> > > excessive OpenOwners?
> > An OpenOwner represents a process on the client. The OpenOwner
> > name is an encoding of pid + process startup time.
> > However, I can't think of an easy way to get at the OpenOwner name.
> >
> > Now, why aren't they going away, hmm..
> >
> > I'm assuming the # of Opens is not large?
> > (Openowners cannot go away until all associated opens
> >  are closed.)
> 
> Oh, I didn't mention that yes the number of Opens is large.  Right
> now, for example, I have 7950 OpenOwner and 8277 Open.
Well, the openowners cannot go away until the opens go away,
so the problem is that the opens are not getting closed.

Close happens when the v_usecount on the vnode goes to zero.
Something is retaining the v_usecount. One possibility is that most
of the opens are for the same file, but with different openowners.
If that is the case, the "oneopenown" mount option will deal with it.

Another possibility is that something is retaining a v_usecount
reference on a lot of the vnodes. (This used to happen when a nullfs
mount with caching enabled was on top of the nfs mount.)
I don't know what other things might do that?

> >
> > Commit 1cedb4ea1a79 in main changed the semantics of this
> > a little, to avoid a use-after-free bug. However, it is dated
> > Feb. 25, 2022 and is not in 13.0, so I don't think it could
> > be the culprit.
> >
> > Essentially, the function called nfscl_cleanupkext() should call
> > nfscl_procdoesntexist(), which returns true after the process has
> > exited and when that is the case, calls nfscl_cleanup_common().
> > --> nfscl_cleanup_common() will either get rid of the openowner or,
> >   if there are still children with open file descriptors, mark it 
> > "defunct"
> >   so it can be free'd once the children close the file.
> >
> > It could be that X is now somehow creating a long chain of processes
> > where the children inherit a file descriptor and that delays the cleanup
> > indefinitely?
> > Even then, everything should get cleaned up once you kill off X?
> > (It might take a couple of seconds after killing all the processes off.)
> >
> > Another possibility is that the "nfscl" thread is wedged somehow.
> > It is the one that will call nfscl_cleanupkext() once/sec. If it never
> > gets called, the openowners will never go away.
> >
> > Being old fashioned, I'd probably try to figure this out by adding
> > some printf()s to nfscl_cleanupkext() and nfscl_cleanup_common().
> 
> dtrace shows that nfscl_cleanupkext() is getting called at about 0.6 hz.
That sounds ok. Since there are a lot of opens/openowners, it probably
is getting behind.

> >
> > To avoid the problem, you can probably just use the "oneopenown"
> > mount option. With that option, only one openowner is used for
> > all opens. (Having separate openowners for each process was needed
> > for NFSv4.0, but not NFSv4.1/4.2.)
> >
> > > Or is it just a red herring and I shouldn't
> > > worry?
> > Well, you can probably avoid the problem by using the "oneopenown&quo

Re: nfs client's OpenOwner count increases without bounds

2022-05-05 Thread Rick Macklem
Alan Somers  wrote:
> On Wed, May 4, 2022 at 6:56 PM Rick Macklem  wrote:
> >
> > Alan Somers  wrote:
> > > On Wed, May 4, 2022 at 5:23 PM Rick Macklem  wrote:
> > > >
> > > > Alan Somers  wrote:
> > > > > I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) desktop
> > > > > mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It
> > > > > worked fine until a few weeks ago.  Now, the desktop's performance
> > > > > slowly degrades.  It becomes less and less responsive until I restart
> > > > > X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries like
> > > > > "AT keyboard: client bug: event processing lagging behind by 112ms,
> > > > > your system is too slow".  "top -S" shows that the busiest process is
> > > > > nfscl.  A dtrace profile shows that nfscl is spending most of its time
> > > > > in nfscl_cleanup_common, in the loop over all nfsclowner objects.
> > > > > Running "nfsdumpstate" on the server shows thousands of OpenOwners for
> > > > > that client, and < 10 for any other NFS client.  The OpenOwners
> > > > > increases by about 3000 per day.  And yet, "fstat" shows only a couple
> > > > > hundred open files on the NFS file system.  Why are OpenOwners so
> > > > > high?  Killing most of my desktop processes doesn't seem to make a
> > > > > difference.  Restarting X does improve the perceived responsiveness,
> > > > > though it does not change the number of OpenOwners.
> > > > >
> > > > > How can I figure out which process(es) are responsible for the
> > > > > excessive OpenOwners?
> > > > An OpenOwner represents a process on the client. The OpenOwner
> > > > name is an encoding of pid + process startup time.
> > > > However, I can't think of an easy way to get at the OpenOwner name.
> > > >
> > > > Now, why aren't they going away, hmm..
> > > >
> > > > I'm assuming the # of Opens is not large?
> > > > (Openowners cannot go away until all associated opens
> > > >  are closed.)
> > >
> > > Oh, I didn't mention that yes the number of Opens is large.  Right
> > > now, for example, I have 7950 OpenOwner and 8277 Open.
> > Well, the openowners cannot go away until the opens go away,
> > so the problem is that the opens are not getting closed.
> >
> > Close happens when the v_usecount on the vnode goes to zero.
> > Something is retaining the v_usecount. One possibility is that most
> > of the opens are for the same file, but with different openowners.
> > If that is the case, the "oneopenown" mount option will deal with it.
> >
> > Another possibility is that something is retaining a v_usecount
> > reference on a lot of the vnodes. (This used to happen when a nullfs
> > mount with caching enabled was on top of the nfs mount.)
> > I don't know what other things might do that?
>
> Yeah, I remember the nullfs problem.  But I'm not using nullfs on this
> computer anymore.  Is there any debugging facility that can list
> vnodes?  All I know of is "fstat", and that doesn't show anywhere near
> the number of NFS Opens.
Don't ask me. My debugging technology consists of printf()s.

An NFSv4 Open is for a . It is probably opening the same file by many different
processes. The "oneopenown" option makes the client use the same
openowner for all opens, so that there is one open per file.

> >
> > > >
> > > > Commit 1cedb4ea1a79 in main changed the semantics of this
> > > > a little, to avoid a use-after-free bug. However, it is dated
> > > > Feb. 25, 2022 and is not in 13.0, so I don't think it could
> > > > be the culprit.
> > > >
> > > > Essentially, the function called nfscl_cleanupkext() should call
> > > > nfscl_procdoesntexist(), which returns true after the process has
> > > > exited and when that is the case, calls nfscl_cleanup_common().
> > > > --> nfscl_cleanup_common() will either get rid of the openowner or,
> > > >   if there are still children with open file descriptors, mark it 
> > > > "defunct"
> > > >   so it can be free'd once the children close the file.
> > > >
> > > > It could be that X is now somehow creating a long chain of processes
> > > > where the children inhe

Re: nfs client's OpenOwner count increases without bounds

2022-05-05 Thread Rick Macklem
Alan Somers  wrote:
> On Thu, May 5, 2022 at 8:49 AM Rick Macklem  wrote:
> >
> > Alan Somers  wrote:
> > > On Wed, May 4, 2022 at 6:56 PM Rick Macklem  wrote:
> > > >
> > > > Alan Somers  wrote:
> > > > > On Wed, May 4, 2022 at 5:23 PM Rick Macklem  
> > > > > wrote:
> > > > > >
> > > > > > Alan Somers  wrote:
> > > > > > > I have a FreeBSD 13 (tested on both 13.0-RELEASE and 13.1-RC5) 
> > > > > > > desktop
> > > > > > > mounting /usr/home over NFS 4.2 from an 13.0-RELEASE server.  It
> > > > > > > worked fine until a few weeks ago.  Now, the desktop's performance
> > > > > > > slowly degrades.  It becomes less and less responsive until I 
> > > > > > > restart
> > > > > > > X after 2-3 days.  /var/log/Xorg.0.log shows plenty of entries 
> > > > > > > like
> > > > > > > "AT keyboard: client bug: event processing lagging behind by 
> > > > > > > 112ms,
> > > > > > > your system is too slow".  "top -S" shows that the busiest 
> > > > > > > process is
> > > > > > > nfscl.  A dtrace profile shows that nfscl is spending most of its 
> > > > > > > time
> > > > > > > in nfscl_cleanup_common, in the loop over all nfsclowner objects.
> > > > > > > Running "nfsdumpstate" on the server shows thousands of 
> > > > > > > OpenOwners for
> > > > > > > that client, and < 10 for any other NFS client.  The OpenOwners
> > > > > > > increases by about 3000 per day.  And yet, "fstat" shows only a 
> > > > > > > couple
> > > > > > > hundred open files on the NFS file system.  Why are OpenOwners so
> > > > > > > high?  Killing most of my desktop processes doesn't seem to make a
> > > > > > > difference.  Restarting X does improve the perceived 
> > > > > > > responsiveness,
> > > > > > > though it does not change the number of OpenOwners.
> > > > > > >
> > > > > > > How can I figure out which process(es) are responsible for the
> > > > > > > excessive OpenOwners?
> > > > > > An OpenOwner represents a process on the client. The OpenOwner
> > > > > > name is an encoding of pid + process startup time.
> > > > > > However, I can't think of an easy way to get at the OpenOwner name.
> > > > > >
> > > > > > Now, why aren't they going away, hmm..
> > > > > >
> > > > > > I'm assuming the # of Opens is not large?
> > > > > > (Openowners cannot go away until all associated opens
> > > > > >  are closed.)
> > > > >
> > > > > Oh, I didn't mention that yes the number of Opens is large.  Right
> > > > > now, for example, I have 7950 OpenOwner and 8277 Open.
> > > > Well, the openowners cannot go away until the opens go away,
> > > > so the problem is that the opens are not getting closed.
> > > >
> > > > Close happens when the v_usecount on the vnode goes to zero.
> > > > Something is retaining the v_usecount. One possibility is that most
> > > > of the opens are for the same file, but with different openowners.
> > > > If that is the case, the "oneopenown" mount option will deal with it.
> > > >
> > > > Another possibility is that something is retaining a v_usecount
> > > > reference on a lot of the vnodes. (This used to happen when a nullfs
> > > > mount with caching enabled was on top of the nfs mount.)
> > > > I don't know what other things might do that?
> > >
> > > Yeah, I remember the nullfs problem.  But I'm not using nullfs on this
> > > computer anymore.  Is there any debugging facility that can list
> > > vnodes?  All I know of is "fstat", and that doesn't show anywhere near
> > > the number of NFS Opens.
> > Don't ask me. My debugging technology consists of printf()s.
> >
> > An NFSv4 Open is for a  > client), file>. It is probably opening the same file by many different
> > processes. The "oneopenown" option makes the client use the same
> > openowner for all opens, so that there is one open p

Re: nfs stalls client: nfsrv_cache_session: no session

2022-07-16 Thread Rick Macklem
Peter  wrote:
> Hija,
>  I have a problem with NFSv4:
>
> The configuration:
>   Server Rel. 13.1-RC2
> nfs_server_enable="YES"
> nfs_server_flags="-u -t --minthreads 2 --maxthreads 20 -h ..."
Allowing it to go down to 2 threads is very low. I've never even
tried to run a server with less than 4 threads. Since kernel threads
don't generate much overhead, I'd suggest replacing the
minthreads/maxthreads with "-n 32" for a very small server.
(I didn't write the code that allows number of threads to vary and
 never use that either.)

> mountd_enable="YES"
> mountd_flags="-S -p 803 -h ..."
> rpc_lockd_enable="YES"
> rpc_lockd_flags="-h ..."
> rpc_statd_enable="YES"
> rpc_statd_flags="-h ..."
> rpcbind_enable="YES"
> rpcbind_flags="-h ..."
> nfsv4_server_enable="YES"
> sysctl vfs.nfs.enable_uidtostring=1
> sysctl vfs.nfsd.enable_stringtouid=1
> 
>   Client bhyve Rel. 13.1-RELEASE on the same system
> nfs_client_enable="YES"
> nfs_access_cache="600"
> nfs_bufpackets="32"
> nfscbd_enable="YES"
> 
>   Mount-options: nfsv4,readahead=1,rw,async
I would expect the behaviour you are seeing for "intr" and/or "soft"
mounts, but since you are not using those, I don't know how you
broke the session? (10052 is NFSERR_BADSESSION)
You might want to do "nfsstat -m" on the client to see what options
were actually negotiated for the mount and then check that neither
"soft" nor "intr" are there.

I suspect that the recovery thread in the client (called "nfscl") is
somehow wedged and cannot do the recovery from the bad session,
as well.
A "ps axHl" on the client would be useful to see what the
processes/threads are up to on the client when it is hung.

If increasing the number of nfsd threads in the server doesn't resolve
the problem, I'd guess it is some network weirdness caused by how
the bhyve instance is networked to its host. (I always use bridging
for bhyve instances and do NFS mounts, but I don't work those
mounts hard.)

Btw, "umount -N " on the client will normally get rid
of a hung mount, although it can take a couple of minutes to complete.

rick


Access to the share suddenly stalled. Server reports this in messages,
every second:
   nfsrv_cache_session: no session IPaddr=192.168...

Restarting nfsd and mountd didn't help, only now the client started to
also report in messages, every second:
   nfs server 192.168...:/var/sysup/mnt/tmp.6.56160: is alive again

Mounting the same share anew to a different place works fine.

The network babble is this, every second:
   NFS request xid 1678997001 212 getattr fh 0,6/2
   NFS reply xid 1678997001 reply ok 52 getattr ERROR: unk 10052

Forensics: I tried to build openoffice on that share, a couple of
   times. So there was a bit of traffic, and some things may have
   overflown.

There seems to be no way to recover, only crashing the client.






Re: nfs stalls client: nfsrv_cache_session: no session

2022-07-16 Thread Rick Macklem
Peter  wrote:
> Hija,
>   I have a problem with NFSv4:
> 
> The configuration:
>   Server Rel. 13.1-RC2
> nfs_server_enable="YES"
> nfs_server_flags="-u -t --minthreads 2 --maxthreads 20 -h ..."
> mountd_enable="YES"
> mountd_flags="-S -p 803 -h ..."
> rpc_lockd_enable="YES"
> rpc_lockd_flags="-h ..."
> rpc_statd_enable="YES"
> rpc_statd_flags="-h ..."
> rpcbind_enable="YES"
> rpcbind_flags="-h ..."
> nfsv4_server_enable="YES"
> sysctl vfs.nfs.enable_uidtostring=1
> sysctl vfs.nfsd.enable_stringtouid=1
> 
>   Client bhyve Rel. 13.1-RELEASE on the same system
> nfs_client_enable="YES"
> nfs_access_cache="600"
> nfs_bufpackets="32"
> nfscbd_enable="YES"
> 
>   Mount-options: nfsv4,readahead=1,rw,async
> 
> 
> Access to the share suddenly stalled. Server reports this in messages,
> every second:
>nfsrv_cache_session: no session IPaddr=192.168...
The attached little patch might help. It will soon be in stable/13, but is not
in releng/13.1.
It fixes the only way I am aware of that the client's "nfscl" thread
can get "stuck" on an old session and not do session recovery.
It might be worth applying it to the client.

This still doesn't explain how the session got broken in the first place.

rick

Restarting nfsd and mountd didn't help, only now the client started to
also report in messages, every second:
   nfs server 192.168...:/var/sysup/mnt/tmp.6.56160: is alive again

Mounting the same share anew to a different place works fine.

The network babble is this, every second:
   NFS request xid 1678997001 212 getattr fh 0,6/2
   NFS reply xid 1678997001 reply ok 52 getattr ERROR: unk 10052

Forensics: I tried to build openoffice on that share, a couple of
   times. So there was a bit of traffic, and some things may have
   overflown.

There seems to be no way to recover, only crashing the client.





defunct-releng13.1.patch
Description: defunct-releng13.1.patch


Re: NFS issue - newnfs_request: Wrong session srvslot=1 slot=0, freeing free slot!!

2022-08-25 Thread Rick Macklem
Ganbold Tsagaankhuu  wrote:
> Hi,
> 
> We are having trouble with NFS running on STABLE:
> 
> Aug 26 02:21:42 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> Aug 26 02:21:42 iron2 kernel: freeing free slot!!
> Aug 26 02:21:43 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> Aug 26 02:21:43 iron2 kernel: freeing free slot!!
> Aug 26 02:21:54 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> Aug 26 02:21:54 iron2 kernel: freeing free slot!!
> Aug 26 02:21:58 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=2
> Aug 26 02:21:58 iron2 kernel: retseq diff 0x1
> Aug 26 02:21:58 iron2 kernel: freeing free slot!!
> Aug 26 02:21:59 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=2
> Aug 26 02:21:59 iron2 kernel: retseq diff 0x1
> Aug 26 02:21:59 iron2 kernel: freeing free slot!!
> Aug 26 02:22:12 iron2 kernel: newnfs_request: Wrong session srvslot=0 slot=2
> Aug 26 02:22:12 iron2 kernel: retseq diff 0x1
> Aug 26 02:22:12 iron2 kernel: freeing free slot!!
> Aug 26 02:22:14 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> Aug 26 02:22:14 iron2 kernel: freeing free slot!!
> Aug 26 02:22:15 iron2 kernel: newnfs_request: Bad session slot=1
> Aug 26 02:22:15 iron2 kernel: freeing free slot!!
> Aug 26 02:22:30 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=2
> Aug 26 02:22:30 iron2 kernel: retseq diff 0x1
> Aug 26 02:22:30 iron2 kernel: freeing free slot!!
> Aug 26 02:22:31 iron2 kernel: newnfs_request: Bad session slot=1
> Aug 26 02:22:31 iron2 kernel: freeing free slot!!
> Aug 26 02:22:46 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> Aug 26 02:22:46 iron2 kernel: freeing free slot!!
> 
> We are running FreeBSD 13.1-STABLE #3 stable/13-n252198-c1434fd2dea: Fri Aug 
> 26 01:51:53 UTC 2022 and mount options are:
> 
> rw,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> ro,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> 
> Is there any fix for this issue?
- Don't use "soft" mounts. See the Bugs section of "man mount_nfs".
  There are patches in stable/13 dated July 10, 2022. (I have no idea
  how to tell if n252198 would have them) that help, but use of "soft"
  mounts will never work correctly for NFSv4.
- The attached small patch (not committed yet, but should be in
  stable/13 in about 10days) fixes a couple of corner cases. If you
  are using a FreeBSD NFS server, I believe these corner cases only
  occur after the NFS server reboots.

rick
ps: If you test the attached patch, please let me know how it goes.
 
thanks a lot,

Ganbold



slotpos.patch
Description: slotpos.patch


Re: NFS issue - newnfs_request: Wrong session srvslot=1 slot=0, freeing free slot!!

2022-08-25 Thread Rick Macklem
Ganbold Tsagaankhuu  wrote:
> Rick,
> 
> On Fri, Aug 26, 2022 at 11:18 AM Rick Macklem > 
> >mailto:rmack...@uoguelph.ca>> wrote:
Ganbold Tsagaankhuu mailto:ganb...@gmail.com>> wrote:
> > Hi,
> >
> > We are having trouble with NFS running on STABLE:
> >
> > Aug 26 02:21:42 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
[stuff snipped]
> > Aug 26 02:22:46 iron2 kernel: newnfs_request: Wrong session srvslot=1 slot=0
> > Aug 26 02:22:46 iron2 kernel: freeing free slot!!
> >
> > We are running FreeBSD 13.1-STABLE #3 stable/13-n252198-c1434fd2dea: Fri 
> > Aug 26 01:51:53 UTC 2022 and mount options are:
> >
> > rw,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> > ro,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> >
> > Is there any fix for this issue?
> - Don't use "soft" mounts. See the Bugs section of "man mount_nfs".
>   There are patches in stable/13 dated July 10, 2022. (I have no idea
>   how to tell if n252198 would have them) that help, but use of "soft"
>   mounts will never work correctly for NFSv4.
> - The attached small patch (not committed yet, but should be in
>   stable/13 in about 10days) fixes a couple of corner cases. If you
>   are using a FreeBSD NFS server, I believe these corner cases only
>   occur after the NFS server reboots.
> 
> rick
> ps: If you test the attached patch, please let me know how it goes.
> 
> Is this patch for NFS server? We are using Netapp NFS server on server side.
No, it is for the client. However, as far as I know, a Netapp Filer never
generates a NFSERR_BADSESSION error (except maybe if you were using
a post-July 10. 2022 client), so the patch probably doesn't affect you.
(Get rid of "soft" and you should be happy.)

rick

thanks,

Ganbold



thanks a lot,

Ganbold




Re: NFS issue - newnfs_request: Wrong session srvslot=1 slot=0, freeing free slot!!

2022-08-26 Thread Rick Macklem
Ganbold Tsagaankhuu  wrote:
> > Rick,
> >
> > On Fri, Aug 26, 2022 at 11:18 AM Rick Macklem > 
> > >mailto:rmack...@uoguelph.ca>> wrote:
Ganbold Tsagaankhuu mailto:ganb...@gmail.com>> wrote:
> > > Hi,
> > >
> > > We are having trouble with NFS running on STABLE:
> > >
> > > Aug 26 02:21:42 iron2 kernel: newnfs_request: Wrong session srvslot=1 
> > > slot=0
[stuff snipped]
> > > Aug 26 02:22:46 iron2 kernel: newnfs_request: Wrong session srvslot=1 
> > > slot=0
> > > Aug 26 02:22:46 iron2 kernel: freeing free slot!!
> > >
> > > We are running FreeBSD 13.1-STABLE #3 stable/13-n252198-c1434fd2dea: Fri 
> > > Aug 26 01:51:53 UTC 2022 and mount options are:
> > >
> > > rw,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> > > ro,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> > >
> > > Is there any fix for this issue?
Oh, and one more thing. If you have multiple clients mounting the
NFSv4 server, make sure they all have unique hostids.
Check /etc/hostid and "sysctl kern.hostuuid". If two clients have the
same kern.hostuuid, there will be lots of trouble.

rick

> - Don't use "soft" mounts. See the Bugs section of "man mount_nfs".
>   There are patches in stable/13 dated July 10, 2022. (I have no idea
>   how to tell if n252198 would have them) that help, but use of "soft"
>   mounts will never work correctly for NFSv4.
> - The attached small patch (not committed yet, but should be in
>   stable/13 in about 10days) fixes a couple of corner cases. If you
>   are using a FreeBSD NFS server, I believe these corner cases only
>   occur after the NFS server reboots.
>
> rick
> ps: If you test the attached patch, please let me know how it goes.
>
> Is this patch for NFS server? We are using Netapp NFS server on server side.
No, it is for the client. However, as far as I know, a Netapp Filer never
generates a NFSERR_BADSESSION error (except maybe if you were using
a post-July 10. 2022 client), so the patch probably doesn't affect you.
(Get rid of "soft" and you should be happy.)

rick

thanks,

Ganbold



thanks a lot,

Ganbold





Re: double used hostuuids - Re: NFS issue - newnfs_request: Wrong session srvslot=1 slot=0, freeing free slot!!

2022-08-27 Thread Rick Macklem
Ronald Klop  wrote:
>On 8/27/22 00:17, Rick Macklem wrote:
>> Ganbold Tsagaankhuu  wrote:
>>>> Rick,
>>>>
>>>> On Fri, Aug 26, 2022 at 11:18 AM Rick Macklem > 
>>>> >mailto:rmack...@uoguelph.ca>> wrote:
>> Ganbold Tsagaankhuu mailto:ganb...@gmail.com>> wrote:
>>>>> Hi,
>>>>>
>>>>> We are having trouble with NFS running on STABLE:
>>>>>
>>>>> Aug 26 02:21:42 iron2 kernel: newnfs_request: Wrong session srvslot=1 
>>>>> slot=0
>> [stuff snipped]
>>>>> Aug 26 02:22:46 iron2 kernel: newnfs_request: Wrong session srvslot=1 
>>>>> slot=0
>>>>> Aug 26 02:22:46 iron2 kernel: freeing free slot!!
>>>>>
>>>>> We are running FreeBSD 13.1-STABLE #3 stable/13-n252198-c1434fd2dea: Fri 
>>>>> Aug 26 01:51:53 UTC 2022 and mount options are:
>>>>>
>>>>> rw,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
>>>>> ro,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
>>>>>
>>>>> Is there any fix for this issue?
>> Oh, and one more thing. If you have multiple clients mounting the
>> NFSv4 server, make sure they all have unique hostids.
>> Check /etc/hostid and "sysctl kern.hostuuid". If two clients have the
>> same kern.hostuuid, there will be lots of trouble.
>>
>> rick
>
>
>Just a thought. Is it possible/easy to warn about double used hostuuids >from 
>different client IP addresses?
>Although that will not help this person using Netapp as a server.
I don't think so. Same hostuuid implies same system, so how does a
server know they are two different systems?
- A client could have multiple IP host addresses, so different client
  host IP addresses for a TCP connection does not imply different systems.

I can, however, modify the console message the server generates when
it sees a session has been replaced to include "check clients have
unique hostuuids", which might help.

I also plan on adding a sentence to "man mount_nfs" about this,
since I just had an email discussion with someone else where the
problem turned out to be "same hostuuids for multiple clients"
and the loss of sessions on the FreeBSD server was the hint that
clued me in.

At least I now know this configuration issue exists.

rick

Regards,
Ronald.



Re: double used hostuuids - Re: NFS issue - newnfs_request: Wrong session srvslot=1 slot=0, freeing free slot!!

2022-08-28 Thread Rick Macklem
Ronald Klop wrote:
> Van: Pete French 
> Datum: 28 augustus 2022 10:16
> Aan: stable@freebsd.org
> Onderwerp: Re: double used hostuuids - Re: NFS issue - newnfs_request: Wrong 
> session srvslot=1 slot=0, freeing free slot!!
> 
> On 27/08/2022 16:18, Rick Macklem wrote:
> > Ronald Klop  wrote:
> >> On 8/27/22 00:17, Rick Macklem wrote:
> >>> Ganbold Tsagaankhuu  wrote:
> >>>>> Rick,
> >>>>>
> >>>>> On Fri, Aug 26, 2022 at 11:18 AM Rick Macklem > 
> >>>>> >mailto:rmack...@uoguelph.ca>> wrote:
> >>> Ganbold Tsagaankhuu mailto:ganb...@gmail.com>> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> We are having trouble with NFS running on STABLE:
> >>>>>>
> >>>>>> Aug 26 02:21:42 iron2 kernel: newnfs_request: Wrong session srvslot=1 
> >>>>>> slot=0
> >>> [stuff snipped]
> >>>>>> Aug 26 02:22:46 iron2 kernel: newnfs_request: Wrong session srvslot=1 
> >>>>>> slot=0
> >>>>>> Aug 26 02:22:46 iron2 kernel: freeing free slot!!
> >>>>>>
> >>>>>> We are running FreeBSD 13.1-STABLE #3 stable/13-n252198-c1434fd2dea: 
> >>>>>> Fri Aug 26 01:51:53 UTC 2022 and mount options are:
> >>>>>>
> >>>>>> rw,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> >>>>>> ro,nfsv4,minorversion=1,bg,soft,timeo=20,retrans=5,retrycnt=5
> >>>>>>
> >>>>>> Is there any fix for this issue?
> >>> Oh, and one more thing. If you have multiple clients mounting the
> >>> NFSv4 server, make sure they all have unique hostids.
> >>> Check /etc/hostid and "sysctl kern.hostuuid". If two clients have the
> >>> same kern.hostuuid, there will be lots of trouble.
> >>>
> >>> rick
> >>
> >> Just a thought. Is it possible/easy to warn about double used hostuuids 
> >> >from different client IP addresses?
> >> Although that will not help this person using Netapp as a server.
> > I don't think so. Same hostuuid implies same system, so how does a
> > server know they are two different systems?
> > - A client could have multiple IP host addresses, so different client
> >host IP addresses for a TCP connection does not imply different systems.
> >
> > I can, however, modify the console message the server generates when
> > it sees a session has been replaced to include "check clients have
> > unique hostuuids", which might help.
> >
> > I also plan on adding a sentence to "man mount_nfs" about this,
> > since I just had an email discussion with someone else where the
> > problem turned out to be "same hostuuids for multiple clients"
> > and the loss of sessions on the FreeBSD server was the hint that
> > clued me in.
> >
> > At least I now know this configuration issue exists.
> >
> > rick
> >
> > Regards,
> > Ronald.
> >
> 
> It well worth adding this I think. I didnt realise this about NFSv4, and I do 
> a lot with cloud machines, where I > simply clone the discs, and thus ended 
> up with many machines with the same hostid. Took me a while to
> work out why my NFS was havign issues...
I have already committed a change for the server console message to main and it 
will be MFC'd
in a couple of weeks.

I will do a man page update soon, as well.

> -pete.

> It might help this case if the nfs client combined hostid+ip as a client id. 
> Or include mac address. People
> tend to change the mac after a clone.
Well, in the past I have thought about this...
The problem is that, ideally, the string used by the NFSv4 client mount should 
be
invariant over time (including client reboot cycles).
Depending upon the situation, a machine's IP addresses can change over time 
(dynamically
assigned via dhcp, for example). They can also end up as addresses like 
192.168.1.n sitting
behind a nat gateway, where the IP could be duplicated on other subnets.
As for MAC, if it is taken from a hardware card, then that hardware card gets 
replaced,
the MAC changes.

I think /etc/hostid (or whatever is used to set "kern.hostuuid") seems the best 
bet
for something unique that remains invariant for the life of the system.
I think all cloners need to do is remove /etc/hostid from the master being
cloned and then each clone will generate their own /etc/hostid upon first boot.

rick

Regards,
Ronald



Re: nfs stalls client: nfsrv_cache_session: no session

2022-08-28 Thread Rick Macklem
Also, if you have multiple clients, make sure that they
all have unique /etc/hostid's. A duplicate machine with
the same /etc/hostid as another one will screw up NFSv4
really badly.

rick


From: owner-freebsd-sta...@freebsd.org  on 
behalf of Peter 
Sent: Saturday, July 16, 2022 8:06 AM
To: freebsd-sta...@freebsd.org
Subject: nfs stalls client: nfsrv_cache_session: no session

CAUTION: This email originated from outside of the University of Guelph. Do not 
click links or open attachments unless you recognize the sender and know the 
content is safe. If in doubt, forward suspicious emails to ith...@uoguelph.ca


Hija,
  I have a problem with NFSv4:

The configuration:
  Server Rel. 13.1-RC2
nfs_server_enable="YES"
nfs_server_flags="-u -t --minthreads 2 --maxthreads 20 -h ..."
mountd_enable="YES"
mountd_flags="-S -p 803 -h ..."
rpc_lockd_enable="YES"
rpc_lockd_flags="-h ..."
rpc_statd_enable="YES"
rpc_statd_flags="-h ..."
rpcbind_enable="YES"
rpcbind_flags="-h ..."
nfsv4_server_enable="YES"
sysctl vfs.nfs.enable_uidtostring=1
sysctl vfs.nfsd.enable_stringtouid=1

  Client bhyve Rel. 13.1-RELEASE on the same system
nfs_client_enable="YES"
nfs_access_cache="600"
nfs_bufpackets="32"
nfscbd_enable="YES"

  Mount-options: nfsv4,readahead=1,rw,async


Access to the share suddenly stalled. Server reports this in messages,
every second:
   nfsrv_cache_session: no session IPaddr=192.168...

Restarting nfsd and mountd didn't help, only now the client started to
also report in messages, every second:
   nfs server 192.168...:/var/sysup/mnt/tmp.6.56160: is alive again

Mounting the same share anew to a different place works fine.

The network babble is this, every second:
   NFS request xid 1678997001 212 getattr fh 0,6/2
   NFS reply xid 1678997001 reply ok 52 getattr ERROR: unk 10052

Forensics: I tried to build openoffice on that share, a couple of
   times. So there was a bit of traffic, and some things may have
   overflown.

There seems to be no way to recover, only crashing the client.






Re: Kernel DHCP unpredictable/fails (PXE boot), userspace DHCP works just fine

2023-03-16 Thread Rick Macklem
On Thu, Mar 16, 2023 at 1:44 PM Attila Nagy  wrote:
>
> Hi,
>
> As this is super annoying, I'm willing to pay a $500 bounty for solving this 
> issue (whomever is first, however I don't anticipate a big competition :) 
> Having an invoice would be best, but I'm willing to accept individuals as 
> well).
> I can't give remote access, but can run debug builds with serial console. 
> stable/13 branch.
>
> I have a bunch of netbooted machines, one set in a cluster is older (HP DL80 
> G9, 2x8C, Intel I350 -igb- NICs), the other set is newer (HP XL225n G10, AMD 
> EPYC2x16C, BCM57412 -bnxt- NICs).
> All of these boot from the network, which is basically:
> - get IP and options with DHCP with the help of the NIC's PXE stack
> - get the loader and kernel, start it
> - do another round of DHCP from the kernel (bootp_subr.c)
> - mount the root via NFS and let everything work as usual
>
> The problem is that the newer machines take an indefinite time to boot. The 
> older ones (with igb NIC) work reliably, they always boot fast.
Haven't you at least partially answered the question yourself here?
In other words, it sounds like there is an issue with the NIC driver
for the newer chip. (If you can replace the NIC with one with
a different chip, I'd try that.)

A possible workaround would be to switch to using "options NFS_ROOT" instead of
"BOOTP_NFSROOT".  This way of doing diskless NFS depends on pexboot
loading the FreeBSD boot loader and then it sets enough environment
variables so that a kernel built with "options NFS_ROOT" and no
"options BOOTP_NFSROOT"
will boot.

Yes, both approaches should work, but if one doesn't, ... rick

> The process of getting an IP address via DHCP (bootpc_call from bootp_subr.c) 
> either succeeds normally (in a few seconds), or takes a lot of time.
> Common (measured) times to boot range from 10s of minutes to anywhere between 
> a few hours (1-6).
> Sometimes it just gets stuck and couldn't get past bootpc_call (getting the 
> DHCP lease).
>
> What I've already tried:
> - we have a redundant set of DHCP servers which offer static leases (so there 
> are two DHCPOFFERs), so I tried to turn off one of them, nothing has changed
> - tried to disable SMP, the effect is the same
> - tried to see whether it's a network issue. The NIC's PXE stack always gets 
> the lease quickly and booting FreeBSD from an ISO and issuing dhclient on the 
> same interface is also fast. After the machines have booted, there are no 
> network issues, they work reliably (since more than a year for 20+ machines, 
> so not just a few hours)
>
> This issue wasn't so bad previously (only a few mins to tens of minutes 
> delay), but recently it got pretty unbearable, even making some machines 
> unbootable for days...
>
> First I thought it might be a packet loss (or more exactly packet delivery 
> from the DHCP server to the receiving socket), either in the network or in 
> the NIC/kernel itself, so I placed a few random printfs into bootp_subr.c and 
> udp_usrreq.c.
>
> After spending some time trying to understand the problem it feels like a 
> race condition in
> bootpc_call, but I don't know the code well enough to effectively verify that.
>
> Here are the modified bootp_subr.c and udp_usrreq.c:
> https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/bootp_subr.c
> https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/udp_usrreq.c
> (modified from stable/13 branch from a few weeks earlier)
>
> This is the output with the always working DL80 (igb) machine:
> https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/DL80%2520igb%2520good.txt
>
> This is the console output from a working boot for the XL225n (bnxt) machine:
> https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520good.txt
> as you can see, it's much slower than the DL80 (which also isn't that fast...)
>
> And this one is a longer output, without success to that point (2 minutes 
> without completing the DHCP flow):
> https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/XL225n%2520bnxt%2520long.txt
>
> For the latter, here's an excerpt from the DHCP log:
> https://gist.githubusercontent.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a8ade8af252f618c84a46da2452d557ebc5078ac/dhcp_log.txt
>
> It seems the DHCP state always gets reset to IF_DHCP_UNRESOLVED even if 
> there's answers from the DHCP server.
>
> Here's another, longer console log, which succeeded after spending 236 
> seconds in the loop:
> https://gist.github.com/bra-fsn/128ae9a3bbc0dbdbb2f6f4b3e2c5157a/raw/a77f52f5e83c699b38a7c2d3acdc52d26ceeba71/XL225n%2520bnxt%2520long%2520good.txt
>
> Any ideas about this?
>



Sorry, I broke the build for stable/13

2023-05-19 Thread Rick Macklem
Oops, an MFC I did to-day broke the build for
stable/13. I have reverted the commits, so I
think things should be ok now.

My mistake was to assume that a change in
a .h file within a #ifdef _KERNEL would not
affect a userland build. (It turns out that code
in libprocstat defines _KERNEL.)

I'll work on a fix, rick



Re: Sorry to mail you directly with a NFS question...

2023-05-26 Thread Rick Macklem
I am reposting this to the mailing list, as permitted
by Terry Kennedy, since others might be able to
provide useful input and the discussion gets archived.

rick

On Fri, May 26, 2023 at 4:34 PM Terry Kennedy  wrote:
>
> On 2023-05-26 18:47, Rick Macklem wrote:
> > Btw, in general it is better to post to a freebsd mailing
> > list (I read freebsd-current, freebsd-stabke, freebsd-fs
> > plus assorted others). That way replies get archived so
> > that others can find them...
> > (If you agree, I will repost this to freebsd-stable@.)
>
>That's fine. I expect I'll get a couple of "that version is
> unsupported!" replies...
>
>I'll answer this here (feel free to repost) but I will
> continue further discussion on freebsd-stable@.
>
> > Almost all of these intermittent hangs are the result
> > of network fabric issues. In particular, NFSv3 has changed
> > very little in recent years.
>
>Both servers are on the same switch which has not logged
> any errors. Swapping drives to go back to FreeBSD 12.4 on
> the same hardware makes the problem go away.
>
> > On Fri, May 26, 2023 at 12:55 AM Terry Kennedy
> >  wrote:
> >>
> >> CAUTION: This email originated from outside of the University of
> >> Guelph. Do not click links or open attachments unless you recognize
> >> the sender and know the content is safe. If in doubt, forward
> >> suspicious emails to ith...@uoguelph.ca.
> >>
> >>
> >>... but before I open a PR on something that's possibly a
> >> silly mistake of mine, I figured I'd ask you since you prob-
> >> ably know the answer off the top of your head.
> >>
> >>You probably don't remember me - way back in the day we
> >> worked together to solve some BSD/OS NFS issues. Back then
> >> I was te...@spcvxa.spc.edu.
> >>
> >>I upgraded a system from the latest 12-STABLE to the latest
> >> 13-STABLE and I'm seeing random hangs when doing I/O to a
> >> filesystem exported from 10.4 (yes, I know it is unsupported,
> >> that's why I'm updating 8-). This all worked fine when the
> >> client was running 12-STABLE.
> >>
> >>The filesystem is exported from 10.4 with:
> >>
> >> /usr/local/src  -maproot=root 192.168.1.166
> >>
> >>The filesystem is mounted from 13.2 with:
> >> gate:/usr/local/src /usr/local/src  nfs noinet6,rw,tcp,bg,intr
> >>
> >>Both the 10.4 server and the 13.2 client have a very vanilla
> >> NFS config in /etc/rc.conf:
> >>
> >> nfs_client_enable="YES"
> >> nfs_server_enable="YES"
> >>
> >>Processes hang in a D+ state on the client:
> >>
> >> (0:13) 165h:/sysprog/terry# ps -ax | grep D+
> >> 4222  0  D+ 0:00.01 edt root.cshrc
> >> 4183  1  D+ 0:00.03 _su (tcsh)
> >> 4229  2  D+ 0:00.00 ls -F
> >> 4258  3  D+ 0:00.00 umount -f /usr/local/src
> >>
> > For this to be useful, you need to do "ps axHl" and include
> > all processes/threads including nfsiod threads. The most
> > important info is the "wchan", which tells us what the process/thread
> > is sleeping on. (All D means is that is sleeping without PCATCH
> > and, as such, cannot be interrupted by a signal.)
> >
> > Having said the above, I suspect one or more threads are waiting
> > for RPC replies and the others for resources (like buffer or vnode
> > locks)
> > held by the ones waiting for an RPC reply.
>
>I'll trigger the fault and do that.
>
> >>Neither the 13.2 client nor the 10.4 server logs any errors.
> >>
> >>Once one process gets into this state, any further attempts
> >> at I/O to the filesystem also hang.
> >>
> >>A "umount -f /usr/local/src" just chews CPU in [nfs] state:
> > The command to unmount a hung NFS mount is:
> > # umount -N /usr/local/src
> > --> It can take a couple of minutes to complete, depending upon
> >  the situation. (Look at "man umount".)
>
>I figure that after 30 minutes it just isn't going to happen.
>
>The 10.4 server continues providing the export to other clients,
> so I'm pretty sure this is on the 13.2 side.
>
> >> (1:3) 165h:/sysprog/terry# umount -f /usr/local/src
> >> load: 0.02  cmd: umount 4258 [nfs] 5.29r 0.00u 0.00s 0% 2324k
> >> load: 0.01  cmd: umount 4258 [nfs] 574.56r 0.00u 0.00s 0% 2324k
> >> load: 0.01  cmd

Re: Sorry to mail you directly with a NFS question...

2023-05-29 Thread Rick Macklem
On Sun, May 28, 2023 at 5:27 PM Terry Kennedy  wrote:
>
> >   I have gathered the data requested and I'm attaching it to
> > this message instead of inlining it, so only those people who
> > want to look at it will see it.
>
>So much for that idea... I was expecting the former mailman
> behavior of not forwarding attachments to the list, instead
> turning them into hotlinks.
>
> Script started on Sun May 28 19:34:20 2023
> (0:2) 165h:/tmp# ps axHl
>
> UID   PID  PPID  C PRI NI VSZRSS MWCHAN   STAT TTTIME
> COMMAND
>0 0 0  4 -16  0   0   2224 swapin   DLs   -   139:23.18
> [kernel/swapper]
>0 0 0  0 -76  0   0   2224 -DLs   - 0:00.01
> [kernel/softirq_0]
>0 0 0  1 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_1]
>0 0 0  2 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_2]
>0 0 0  3 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_3]
>0 0 0  4 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_4]
>0 0 0  5 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_5]
>0 0 0  6 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_6]
>0 0 0  7 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_7]
>0 0 0  8 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_8]
>0 0 0  9 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_9]
>0 0 0 10 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_10]
>0 0 0 11 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_11]
>0 0 0 12 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_12]
>0 0 0 13 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_13]
>0 0 0 14 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_14]
>0 0 0 15 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_15]
>0 0 0 16 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_16]
>0 0 0 17 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_17]
>0 0 0 18 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_18]
>0 0 0 19 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_19]
>0 0 0 20 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_20]
>0 0 0 21 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_21]
>0 0 0 22 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_22]
>0 0 0 23 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/softirq_23]
>0 0 0  0 -76  0   0   2224 -DLs   - 1:36.37
> [kernel/if_io_tqg_0]
>0 0 0  1 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_1]
>0 0 0  2 -76  0   0   2224 -DLs   - 0:01.22
> [kernel/if_io_tqg_2]
>0 0 0  3 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_3]
>0 0 0  4 -76  0   0   2224 -DLs   - 0:02.07
> [kernel/if_io_tqg_4]
>0 0 0  5 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_5]
>0 0 0  6 -76  0   0   2224 -DLs   - 0:01.17
> [kernel/if_io_tqg_6]
>0 0 0  7 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_7]
>0 0 0  8 -76  0   0   2224 -DLs   - 0:11.65
> [kernel/if_io_tqg_8]
>0 0 0  9 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_9]
>0 0 0 10 -76  0   0   2224 -DLs   - 0:01.89
> [kernel/if_io_tqg_10]
>0 0 0 11 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_11]
>0 0 0 12 -76  0   0   2224 -DLs   - 0:01.65
> [kernel/if_io_tqg_12]
>0 0 0 13 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_13]
>0 0 0 14 -76  0   0   2224 -DLs   - 0:01.70
> [kernel/if_io_tqg_14]
>0 0 0 15 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_15]
>0 0 0 16 -76  0   0   2224 -DLs   - 0:01.64
> [kernel/if_io_tqg_16]
>0 0 0 17 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_17]
>0 0 0 18 -76  0   0   2224 -DLs   - 0:01.24
> [kernel/if_io_tqg_18]
>0 0 0 19 -76  0   0   2224 -DLs   - 0:00.00
> [kernel/if_io_tqg_19]
>0 0 0 20 -76  0   0   2224 -DLs   - 0:14.46
> [kernel/if_io_tqg_20]
>0 0 0 21 -76  0   0   2224 -DLs   - 0:00.

Re: Sorry to mail you directly with a NFS question...

2023-05-29 Thread Rick Macklem
Since the reply ended up at the end of the long email, I'll
post it here as well.

On Sun, May 28, 2023 at 5:55 PM Terry Kennedy  wrote:
>
>[This is the first time I'm trying to use the new FreeBSD
> list serer, and it is behaving really bizarrely - it stripped
> out the attachment in my first message, and when I sent the
> attachment in a subsequent email, it REPLACED my prior email
> which has vanished. I'm trying to reconstruct what I said.
> Fortunately I still have the hung terminal windows open so
> I have that data.]
>
>I can easily reproduce this bug by editing a file on the
> NFS filesystem, making a trivial change and doing "save and
> exit" - instant hang.
>
>I gathered the data Rick requested which is in my previous
> post.
>
I'm afraid that nothing here indicates what the problem is,
from what I can see at a quick glance.

The only thing I can think of is that your "save and exit"
might use byte range locking and rpc.lockd is flakey
at best.
If file locking does not need to be seen by other clients,
you can use the "nolockd" mount option to do the file
locking locally within the client.
If other clients do need to see the locks (files being
concurrently accessed from multiple clients), then
NFSv4 does a much better job of file locking.
(Since your server is quite old, I am not sure if switching
 to NFSv4 would be feasible for you.)

Maybe others have some other ideas, rick

>In another terminal window on the 13.2 system (165h) with
> the hang, both filesystems show up, even after the hang:
>
> (0:19) 165h:/tmp# df -h
> Filesystem SizeUsed   Avail Capacity  Mounted on
> ...
> gate:/usr/local/src7.7G3.3G3.8G46%/usr/local/src
> gate:/sysprog   62G 22G 35G39%/sysprog
> ...
>
>In that other terminal window, I can create a file with
> 'touch' (and it is indeed created, looking at the directory
> from other clients running 12.4). But any attempt to list
> the directory results in a hang:
>
> (0:22) 165h:/tmp# touch /usr/local/src/envir/foo
> (0:23) 165h:/tmp# ls /usr/local/src/envir
> load: 0.00  cmd: ls 97107 [nfs] 25.89r 0.00u 0.00s 0% 2864k
> load: 0.00  cmd: ls 97107 [nfs] 52.44r 0.00u 0.00s 0% 2864k
> load: 0.00  cmd: ls 97107 [nfs] 175.41r 0.00u 0.00s 0% 2864k
>
>In yet another terminal window, a create + write (as opposed
> to just a "touch") hangs:
> (0:2) 165h:/sysprog/terry# echo "Testing 123" > /usr/local/src/envir/bar
> load: 0.00  cmd: tcsh 97128 [nfs] 61.17r 0.02u 0.00s 0% 4248k
> load: 0.01  cmd: tcsh 97128 [nfs] 2736.92r 0.02u 0.00s 0% 4248k
>
>From another 12.4 client that has the same filesystem mounted,
> things continue to work normally:
>
> (0:634) new-gate:~terry# echo "Testing 123" > /usr/local/src/envir/baz
> (0:635) new-gate:~terry# cat /usr/local/src/envir/baz
> Testing 123
>
>Based on this, I think it is a client-side problem on the
> 13.2 system.
>



Re: Did something change with ZFS and vnode caching?

2023-09-01 Thread Rick Macklem
On Thu, Aug 31, 2023 at 12:05 PM Garrett Wollman  wrote:
>
> <  said:
>
> > Any suggestions on what we should monitor or try to adjust?
I remember you mentioning that you tried increasing kern.maxvnodes
but I was wondering if you've tried bumping it way up (like 10X what it
currently is)?

You could try decreasing the max nfsd threads (--maxthreads command
line option for nfsd. That would at least limit the # of vnodes used
by the nfsd.

rick

>
> To bring everyone up to speed: earlier this month we upgraded our NFS
> servers from 12.4 to 13.2 and found that our backup system was
> absolutely destroying NFS performance, which had not happened before.
>
> With some pointers from mjg@ and the thread relating to ZFS
> performance on current@ I built a stable/13 kernel
> (b5a5a06fc012d27c6937776bff8469ea465c3873) and installed it on one of
> our NFS servers for testing, then removed the band-aid on our backup
> system and allowed it to go as parallel as it wanted.
>
> Unfortunately, we do not control the scheduling of backup jobs, so
> it's difficult to tell whether the changes made any difference.  Each
> backup job does a parallel breadth-first traversal of a given
> filesystem, using as many as 150 threads per job (the backup client
> auto-scales itself), and we sometimes see as many as eight jobs
> running in parallel on one file server.  (There are 17, soon to be 18,
> file servers.)
>
> When the performance of NFS's backing store goes to hell, the NFS
> server is not able to put back-pressure on the clients hard enough to
> stop them from writing, and eventually the server runs out of 4k jumbo
> mbufs and crashes.  This at least is a known failure mode, going back
> a decade.  Before it gets to this point, the NFS server also
> auto-scales itself, so it's in competition with the backup client over
> who can create the most threads and ultimately allocate the most
> vnodes.
>
> Last night, while I was watching, the first dozen or so backups went
> fine, with no impact to NFS performance, until the backup server
> decided to schedule scans of two, and then three, parallel scans of
> filesystems containing about 35 million files each.  These tend to
> take an hour or four, depending on how much changed data is identified
> during the scane, but most of the time it's just sitting in a
> readdir()/fstatat() loop with a shared work queue for parallelism.
> (That's my interpretation based on its activity; we do not have source
> code.)
>
> Once these scans were underway, I observed the same symptoms as on
> releng/13.2, with lots of lock contention and the vnlru process
> running almost constantly (95% CPU, so most of a core on this
> 20-core/40-thread server).  From our monitoring, the server was
> recycling about 35k vnodes per second during this period.  I wasn't
> monitoring these statistics before so I don't have historical
> comparisons.  My working assumption, such as it is, is that the switch
> from OpnSolaris ZFS to OpenZFS in 13.x moved some bottlenecks around
> so that the backup client previously got tangled higher up in the ZFS
> code and now can put real pressure on the vnode allocator.
>
> During the hour that the three backup clients were running, I was able
> to run mjg@'s dtrace script and generate a flame graph, which is
> viewable at .
> This just shows what the backup clients themselves are doing, and not
> what's going on in the vnlru or nfsd processes.  You can ignore all
> the umtx stacks since that's just coordination between the threads in
> the backup client.
>
> On the "oncpu" side, the trace captures a lot of time spent spinning
> in lock_delay(), although I don't see where the alleged call site
> acquires any locks, so there must have been some inlining.  On the
> "offcpu" side, it's clear that there's still a lot of time spent
> sleeping on vnode_list_mtx in the vnode allocation pathway, both
> directly from vn_alloc_hard() and also from vnlru_free_impl() after
> the mutex is dropped and then needs to be reacquired.
>
> In ZFS, there's also a substantial number of waits (shown as
> sx_xlock_hard stack frames), in both the easy case (a free vnode was
> readily available) and the hard case where vn_alloc_hard() calls
> vnlru_free_impl() and eventually zfs_inactive() to reclaim a vnode.
> Looking into the implementation, I noted that ZFS uses a 64-entry hash
> lock for this, and I'm wondering if there's an issue with false
> sharing.  Can anyone with ZFS experience speak to that?  If I
> increased ZFS_OBJ_MTX_SZ to 128 or 256, would it be likely to hurt
> something else (other than memory usage)?  Do we even know that the
> low-order 6 bits of ZFS object IDs are actually uniformly distributed?
>
> -GAWollman
>
>



Re: NFS exports of ZFS snapshots broken

2023-11-17 Thread Rick Macklem
On Fri, Nov 17, 2023 at 7:10 PM Garrett Wollman  wrote:
>
> < said:
>
> > Looks like main still has this problem.  I exported a ZFS file system from
> > a 15-current system to the same 13.2 client, and it exhibits the problem
> > (EIO from NFSv3).  So the problem is most likely in 14.0 as well.
>
> Glad someone was able to reproduce.  I don't have any systems I can
> easily pave over to try to bisect.  I spent an hour looking at git
> and I really don't see any obvious culprits.
Just fyi for everyone, I've sent a simple patch that I think disables the jail
checking stuff to Mike and hopefully he can try it. I now see that all the
"run nfsd(8) in vnet jails" is not in releng/13.2, but is in stable/13.

If it does fix the problem, then I'll have to dig into the ZFS code to try and
figure out how to fix this, yuck!

Hopefully Mike can determine if this is the problem, rick

>
> -GAWollman
>
>



Re: RELENG_13 kernel breakage

2024-01-03 Thread Rick Macklem
Pointy hat goes on me.

This should be fixed now, rick

On Wed, Jan 3, 2024 at 5:56 AM Helge Oldach  wrote:
>
> Hi Mike,
>
> mike tancsa wrote on Wed, 03 Jan 2024 14:47:31 +0100 (CET):
> > Hi,
> >
> >  Both my i386 and amd64 kernels wont build this AM. It seems to be
> > due to
> >
> > https://cgit.freebsd.org/src/commit/?h=stable/13&id=e7044084cf813bfb66cbea8e9278895b26eda5d2
>
> Please see PR 276045
>
> Kind regards
> Helge
>



Re: mounting NFS share from the jail

2024-01-20 Thread Rick Macklem
On Sat, Jan 20, 2024 at 6:48 AM Marek Zarychta
 wrote:
>
> Dear List,
>
> there were some efforts to allow running nfsd(8) inside the jail, but is
> mounting an NFS share from the jail allowed?  Inside the jail
> "security.jail.mount_allowed" is set to 1, I also added "add path net
> unhide" to the ruleset in devfs.rules but when trying to mount the NFS
> share I get only the error:
>
> mount_nfs: nmount: /usr/src: Operation not permitted
>
> It's not a big deal, the shares can be mounted from the jail host, but I
> am surprised that one can run NFSD inside the jail while mounting NFS
> shares is still denied.
>
> Am I missing anything or is mounting NFS from inside the jail still
> unsupported?  The tests were done on the recent stable/14 from the vnet
> jail.  Any clues h will be appreciated.
You are correct. Mounting from inside a jail is not supported.
After doing the vnet conversion for nfsd, I tried doing it for the NFS client.
There were a moderate # of global variables that needed to be vnet'd,
which I did.  The hard/messy part was having the threads (anything that
calls an NFS VFS/VOP call) set to the proper vnet.
It would have required a massive # of CURVET_SET()/CURVET_RESTORE()
macros and I decided that it was just too messy.

If it becomes a necessary feature, it is ugly but doable.

rick

>
> Cheers
>
> --
> Marek Zarychta
>



Re: mounting NFS share from the jail

2024-01-20 Thread Rick Macklem
On Sat, Jan 20, 2024 at 10:55 AM Charles Sprickman  wrote:
>
>
>
> > On Jan 20, 2024, at 10:09 AM, Rick Macklem  wrote:
> >
> > On Sat, Jan 20, 2024 at 6:48 AM Marek Zarychta
> >  wrote:
> >>
> >> Dear List,
> >>
> >> there were some efforts to allow running nfsd(8) inside the jail, but is
> >> mounting an NFS share from the jail allowed?  Inside the jail
> >> "security.jail.mount_allowed" is set to 1, I also added "add path net
> >> unhide" to the ruleset in devfs.rules but when trying to mount the NFS
> >> share I get only the error:
> >>
> >> mount_nfs: nmount: /usr/src: Operation not permitted
> >>
> >> It's not a big deal, the shares can be mounted from the jail host, but I
> >> am surprised that one can run NFSD inside the jail while mounting NFS
> >> shares is still denied.
> >>
> >> Am I missing anything or is mounting NFS from inside the jail still
> >> unsupported?  The tests were done on the recent stable/14 from the vnet
> >> jail.  Any clues h will be appreciated.
> > You are correct. Mounting from inside a jail is not supported.
> > After doing the vnet conversion for nfsd, I tried doing it for the NFS 
> > client.
> > There were a moderate # of global variables that needed to be vnet'd,
> > which I did.  The hard/messy part was having the threads (anything that
> > calls an NFS VFS/VOP call) set to the proper vnet.
> > It would have required a massive # of CURVET_SET()/CURVET_RESTORE()
> > macros and I decided that it was just too messy.
>
> (slight hijack)
>
> I'm curious, I currently have a need for either have an nfs server or client 
> in a jail and have had no luck even with the userspace nfsd 
> (https://unfs3.github.io/ / https://www.freshports.org/net/unfs3/). Is there 
> any in-jail solution that works on FreeBSD? It's mainly for very light 
> log-parsing and I want it all inside a jail for portability between hosts. 
> Not even married to nfs if there's another in-jail option...

As above, NFS client mount no, nfsd yes.
See:
https://people.freebsd.org/~rmacklem/nfsd-vnet-prison-setup.txt

rick

>
> Charles
>
>
> > If it becomes a necessary feature, it is ugly but doable.
> >
> > rick
> >
> >>
> >> Cheers
> >>
> >> --
> >> Marek Zarychta
>
>



Re: 13-stable NFS server hang

2024-02-28 Thread Rick Macklem
On Tue, Feb 27, 2024 at 9:30 PM Garrett Wollman  wrote:
>
> Hi, all,
>
> We've had some complaints of NFS hanging at unpredictable intervals.
> Our NFS servers are running a 13-stable from last December, and
> tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> able to clearly see that there were periods when NFS activity would
> drop *instantly* from 30,000 ops/s to flat zero, which would last
> for about 25 seconds before resuming exactly as it was before.
>
> I wrote a little awk script to watch for this happening and run
> `procstat -k` on the nfsd process, and I saw that all but two of the
> service threads were idle.  The three nfsd threads that had non-idle
> kstacks were:
>
>   PIDTID COMMTDNAME  KSTACK
>   997 108481 nfsdnfsd: mastermi_switch 
> sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> fast_syscall_common
My guess is that "master" is just a glitch here. The RPCs are all serviced
by kernel threads. The "server" process is just the process these threads
would normally be associated with. The "master" process only enters the
kernel to add a new TCP socket for servicing. (ie. I don't know why procstat
thought the kernel thread was associated with "master" instead of "server",
but it shouldn't matter.)

>   997 960918 nfsdnfsd: service   mi_switch 
> sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> fork_trampoline
This one is a fresh mount (note the exchangeid) and this one needs an exclusive
lock on the NFSv4 state. Put another way, it is trying to stop all
other nfsd thread
activity, so that it can add the new client.

>   997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc 
> nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
Copy file range has some code in the generic function that limits the
copy time to
one second for a large copy.  I had hoped this was sufficient to avoid Copy
from "hogging" the server.  My guess is that the 1 second limit is not working.
This may be because the ZFS function does not return for a long time.
When I wrote the code, ZFS did not have its own VOP_COPY_FILE_RANGE().
(There is a special "kernel only" flag that needs to be set in the argument
for vn_generic_copy_file_range()  that ZFS might not be doing. I'll look later
to-day. However, it will need to be fixed for the case where it does not
call vn_generic_copy_file_range().)

When I first did the code, I limited the time it spent in VOP_COPY_FILE_RANGE()
by clipping the size of the Copy, but the problem is that it is hard
to estimate how
large a transfer can take without knowing any details about the storage.
I might have to go back to this approach.

>
> I'm suspicious of two things: first, the copy_file_range RPC; second,
> the "master" nfsd thread is actually servicing an RPC which requires
> obtaining a lock.  The "master" getting stuck while performing client
> RPCs is, I believe, the reason NFS service grinds to a halt when a
> client tries to write into a near-full filesystem, so this problem
> would be more evidence that the dispatching function should not be
> mixed with actual operations.  I don't know what the clients are
> doing, but is it possible that nfsrvd_copy_file_range is holding a
> lock that is needed by one or both of the other two threads?
Sort of. My guess is that ZFS's VOP_COPY_FILE_RANGE() is
taking a long time doing a large copy while holding the shared
lock on the nfsd.
Meanwhile the second thread is trying to acquire an exclusive
lock on the nfsd so that it can add the new client.  This means
that the other nfsd threads will slowly get blocked iuntil they
all release the shared lock.

>
> Near-term I could change nfsrvd_copy_file_range to just
> unconditionally return NFSERR_NOTSUP and force the clients to fall
> back, but I figured I would ask if anyone else has seen this.
Yes, I think that is what you will need to do to avoid this.

Thanks for reporting it. I have some work to do,
I need to think of how ZFS's VOP_COPY_FILE_RANGE() can be
limited so that it does not "hog" the server.

rick

>
> -GAWollman
>
>



Re: 13-stable NFS server hang

2024-02-28 Thread Rick Macklem
On Tue, Feb 27, 2024 at 9:30 PM Garrett Wollman  wrote:
>
> Hi, all,
>
> We've had some complaints of NFS hanging at unpredictable intervals.
> Our NFS servers are running a 13-stable from last December, and
> tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> able to clearly see that there were periods when NFS activity would
> drop *instantly* from 30,000 ops/s to flat zero, which would last
> for about 25 seconds before resuming exactly as it was before.
>
> I wrote a little awk script to watch for this happening and run
> `procstat -k` on the nfsd process, and I saw that all but two of the
> service threads were idle.  The three nfsd threads that had non-idle
> kstacks were:
>
>   PIDTID COMMTDNAME  KSTACK
>   997 108481 nfsdnfsd: mastermi_switch 
> sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> fast_syscall_common
>   997 960918 nfsdnfsd: service   mi_switch 
> sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> fork_trampoline
>   997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc 
> nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
>
> I'm suspicious of two things: first, the copy_file_range RPC; second,
> the "master" nfsd thread is actually servicing an RPC which requires
> obtaining a lock.  The "master" getting stuck while performing client
> RPCs is, I believe, the reason NFS service grinds to a halt when a
> client tries to write into a near-full filesystem, so this problem
> would be more evidence that the dispatching function should not be
> mixed with actual operations.  I don't know what the clients are
> doing, but is it possible that nfsrvd_copy_file_range is holding a
> lock that is needed by one or both of the other two threads?
>
> Near-term I could change nfsrvd_copy_file_range to just
> unconditionally return NFSERR_NOTSUP and force the clients to fall
> back, but I figured I would ask if anyone else has seen this.
I have attached a little patch that should limit the server's Copy size
to vfs.nfsd.maxcopyrange (default of 10Mbytes).
Hopefully this makes sure that the Copy does not take too long.

You could try this instead of disabling Copy. It would be nice to know if
this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)

rick

>
> -GAWollman
>
>


copylen.patch
Description: Binary data


Re: 13-stable NFS server hang

2024-02-29 Thread Rick Macklem
On Wed, Feb 28, 2024 at 4:04 PM Rick Macklem  wrote:
>
> On Tue, Feb 27, 2024 at 9:30 PM Garrett Wollman  
> wrote:
> >
> > Hi, all,
> >
> > We've had some complaints of NFS hanging at unpredictable intervals.
> > Our NFS servers are running a 13-stable from last December, and
> > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > able to clearly see that there were periods when NFS activity would
> > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > for about 25 seconds before resuming exactly as it was before.
> >
> > I wrote a little awk script to watch for this happening and run
> > `procstat -k` on the nfsd process, and I saw that all but two of the
> > service threads were idle.  The three nfsd threads that had non-idle
> > kstacks were:
> >
> >   PIDTID COMMTDNAME  KSTACK
> >   997 108481 nfsdnfsd: mastermi_switch 
> > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> > fast_syscall_common
> >   997 960918 nfsdnfsd: service   mi_switch 
> > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > fork_trampoline
> >   997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc 
> > nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
> >
> > I'm suspicious of two things: first, the copy_file_range RPC; second,
> > the "master" nfsd thread is actually servicing an RPC which requires
> > obtaining a lock.  The "master" getting stuck while performing client
> > RPCs is, I believe, the reason NFS service grinds to a halt when a
> > client tries to write into a near-full filesystem, so this problem
> > would be more evidence that the dispatching function should not be
> > mixed with actual operations.  I don't know what the clients are
> > doing, but is it possible that nfsrvd_copy_file_range is holding a
> > lock that is needed by one or both of the other two threads?
> >
> > Near-term I could change nfsrvd_copy_file_range to just
> > unconditionally return NFSERR_NOTSUP and force the clients to fall
> > back, but I figured I would ask if anyone else has seen this.
> I have attached a little patch that should limit the server's Copy size
> to vfs.nfsd.maxcopyrange (default of 10Mbytes).
> Hopefully this makes sure that the Copy does not take too long.
>
> You could try this instead of disabling Copy. It would be nice to know if
> this is suffciient? (If not, I'll probably add a sysctl to disable Copy.)
I did a quick test without/with this patch,where I copied a 1Gbyte file.

Without this patch, the Copy RPCs mostly replied in just under 1sec
(which is what the flag requests), but took over 4sec for one of the Copy
operations. This implies that one Read/Write of 1Mbyte on the server
took over 3 seconds.
I noticed the first Copy did over 600Mbytes, but the rest did about 100Mbytes
each and it was one of these 100Mbyte Copy operations that took over 4sec.

With the patch, there were a lot more Copy RPCs (as expected) of 10Mbytes
each and they took a consistent 0.25-0.3sec to reply. (This is a test of a local
mount on an old laptop, so nowhere near a server hardware config.)

So, the patch might be sufficient?

It would be nice to avoid disabling Copy, since it avoids reading the data
into the client and then writing it back to the server.

I will probably commit both patches (10Mbyte clip of Copy size and
disabling Copy) to main soon, since I cannot say if clipping the size
of the Copy will always be sufficient.

Pleas let us know how trying these patches goes, rick

>
> rick
>
> >
> > -GAWollman
> >
> >



Re: 13-stable NFS server hang

2024-03-01 Thread Rick Macklem
On Fri, Mar 1, 2024 at 12:00 AM Ronald Klop  wrote:
>
> Interesting read.
>
>  Would it be possible to separate locking for admin actions like a client 
> mounting an fs from traffic flowing for file operations?
Well, the NFS server does not really have any concept of a mount.
What I am referring to is the ClientID maintained for NFSv4 mounts,
which all the open/lock/session/layout state hangs off of.

For most cases, this state information can safely be accessed/modified
via a mutex, but there are three exceptions:
- creating a new ClientID (which is done by the ExchangeID operation)
  and typically happens when a NFS client does a mount.
- delegation Recall (which only happens when delegations are enabled)
  One of the reasons delegations are not enabled by default on the
FreeBSD server.
- the DestroyClientID which is typically done by a NFS client during dismount.
For these cases, it is just too difficult to do them without sleeping.
As such, there is a sleep lock which the nfsd threads normally acquire shared
when doing NFSv4 operations, but for the above cases the lock is aquired
exclusive.
- I had to give the exclusive lock priority over shared lock
acquisition (it is a
  custom locking mechanism with assorted weirdnesses) because without
  that someone reported that new mounts took up to 1/2hr to occur.
  (The exclusive locker waited for 30min before all the other nfsd threads
   were not busy.)
  Because of this priority, once a nfsd thread requests the exclusive lock,
  all other nfsd threads executing NFSv4 RPCs block after releasing their
  shared lock, until the exclusive locker releases the exclusive lock.

In summary, NFSv4 has certain advantages over NFSv3, but it comes
with a lot of state complexity. It just is not feasible to manipulate all that
state with only mutex locking.

rick

>
> Like ongoing file operations could have a read only view/copy of the mount 
> table. Only new operations will have to wait.
> But the mount never needs to wait for ongoing operations before locking the 
> structure.
>
> Just a thought in the morning
>
> Regards,
> Ronald.
>
> Van: Rick Macklem 
> Datum: 1 maart 2024 00:31
> Aan: Garrett Wollman 
> CC: stable@freebsd.org, rmack...@freebsd.org
> Onderwerp: Re: 13-stable NFS server hang
>
> On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:
> >
> > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:
> > >
> > > Hi, all,
> > >
> > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > Our NFS servers are running a 13-stable from last December, and
> > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > able to clearly see that there were periods when NFS activity would
> > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > for about 25 seconds before resuming exactly as it was before.
> > >
> > > I wrote a little awk script to watch for this happening and run
> > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > service threads were idle.  The three nfsd threads that had non-idle
> > > kstacks were:
> > >
> > >   PIDTID COMMTDNAME  KSTACK
> > >   997 108481 nfsdnfsd: mastermi_switch 
> > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> > > fast_syscall_common
> > >   997 960918 nfsdnfsd: service   mi_switch 
> > > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > fork_trampoline
> > >   997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range 
> > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > fork_trampoline
> > >
> > > I'm suspicious of two things: first, the copy_file_range RPC; second,
> > > the "master" nfsd thread is actually servicing an RPC which requires
> > > obtaining a lock.  The "master" getting stuck while performing client
> > > RPCs is, I believe, the reason NFS service grinds to a halt when a
> > > client tries to write into a near-full filesystem, so this problem
> > > would be more evidence that the dispatching function should not be
> > > mixed with actual operatio

Re: 13-stable NFS server hang

2024-03-02 Thread Rick Macklem
On Fri, Mar 1, 2024 at 10:51 PM Konstantin Belousov  wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do 
> not click links or open attachments unless you recognize the sender and know 
> the content is safe. If in doubt, forward suspicious emails to 
> ith...@uoguelph.ca.
>
>
> On Fri, Mar 01, 2024 at 06:23:56AM -0800, Rick Macklem wrote:
> > On Fri, Mar 1, 2024 at 12:00 AM Ronald Klop  wrote:
> > >
> > > Interesting read.
> > >
> > >  Would it be possible to separate locking for admin actions like a client 
> > > mounting an fs from traffic flowing for file operations?
> > Well, the NFS server does not really have any concept of a mount.
> > What I am referring to is the ClientID maintained for NFSv4 mounts,
> > which all the open/lock/session/layout state hangs off of.
> >
> > For most cases, this state information can safely be accessed/modified
> > via a mutex, but there are three exceptions:
> > - creating a new ClientID (which is done by the ExchangeID operation)
> >   and typically happens when a NFS client does a mount.
> > - delegation Recall (which only happens when delegations are enabled)
> >   One of the reasons delegations are not enabled by default on the
> > FreeBSD server.
> > - the DestroyClientID which is typically done by a NFS client during 
> > dismount.
> > For these cases, it is just too difficult to do them without sleeping.
> > As such, there is a sleep lock which the nfsd threads normally acquire 
> > shared
> > when doing NFSv4 operations, but for the above cases the lock is aquired
> > exclusive.
> > - I had to give the exclusive lock priority over shared lock
> > acquisition (it is a
> >   custom locking mechanism with assorted weirdnesses) because without
> >   that someone reported that new mounts took up to 1/2hr to occur.
> >   (The exclusive locker waited for 30min before all the other nfsd threads
> >were not busy.)
> >   Because of this priority, once a nfsd thread requests the exclusive lock,
> >   all other nfsd threads executing NFSv4 RPCs block after releasing their
> >   shared lock, until the exclusive locker releases the exclusive lock.
> Normal lockmgr locks + TDP_DEADLKTREAT private thread flag provide the
> property of pref. exclusive waiters in presence of the shared waiters.
> I think this is what you described above.
It also has some weird properties, like if there are multiple requestors
for the exclusive lock, once one thread gets it (the threads are nfsd worker
threads and indistinct), the others that requested an exclusive thread are
unblocked without the lock being issued to them.
They then check if the exclusive lock is still needed (usually not, since
the other thread has dealt with the case where it was needed) and
then they can acquire a shared lock.
Without this, there were cases where several threads would acquire
the exclusive lock and then discover that the lock was not needed and
just release it again.

It also uses an assortment of weird flags/call args.

rick



Re: 13-stable NFS server hang

2024-03-02 Thread Rick Macklem
On Sat, Mar 2, 2024 at 6:13 AM Konstantin Belousov  wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do 
> not click links or open attachments unless you recognize the sender and know 
> the content is safe. If in doubt, forward suspicious emails to 
> ith...@uoguelph.ca.
>
>
> On Sat, Mar 02, 2024 at 05:40:08AM -0800, Rick Macklem wrote:
> > On Fri, Mar 1, 2024 at 10:51 PM Konstantin Belousov  
> > wrote:
> > >
> > > CAUTION: This email originated from outside of the University of Guelph. 
> > > Do not click links or open attachments unless you recognize the sender 
> > > and know the content is safe. If in doubt, forward suspicious emails to 
> > > ith...@uoguelph.ca.
> > >
> > >
> > > On Fri, Mar 01, 2024 at 06:23:56AM -0800, Rick Macklem wrote:
> > > > On Fri, Mar 1, 2024 at 12:00 AM Ronald Klop  
> > > > wrote:
> > > > >
> > > > > Interesting read.
> > > > >
> > > > >  Would it be possible to separate locking for admin actions like a 
> > > > > client mounting an fs from traffic flowing for file operations?
> > > > Well, the NFS server does not really have any concept of a mount.
> > > > What I am referring to is the ClientID maintained for NFSv4 mounts,
> > > > which all the open/lock/session/layout state hangs off of.
> > > >
> > > > For most cases, this state information can safely be accessed/modified
> > > > via a mutex, but there are three exceptions:
> > > > - creating a new ClientID (which is done by the ExchangeID operation)
> > > >   and typically happens when a NFS client does a mount.
> > > > - delegation Recall (which only happens when delegations are enabled)
> > > >   One of the reasons delegations are not enabled by default on the
> > > > FreeBSD server.
> > > > - the DestroyClientID which is typically done by a NFS client during 
> > > > dismount.
> > > > For these cases, it is just too difficult to do them without sleeping.
> > > > As such, there is a sleep lock which the nfsd threads normally acquire 
> > > > shared
> > > > when doing NFSv4 operations, but for the above cases the lock is aquired
> > > > exclusive.
> > > > - I had to give the exclusive lock priority over shared lock
> > > > acquisition (it is a
> > > >   custom locking mechanism with assorted weirdnesses) because without
> > > >   that someone reported that new mounts took up to 1/2hr to occur.
> > > >   (The exclusive locker waited for 30min before all the other nfsd 
> > > > threads
> > > >were not busy.)
> > > >   Because of this priority, once a nfsd thread requests the exclusive 
> > > > lock,
> > > >   all other nfsd threads executing NFSv4 RPCs block after releasing 
> > > > their
> > > >   shared lock, until the exclusive locker releases the exclusive lock.
> > > Normal lockmgr locks + TDP_DEADLKTREAT private thread flag provide the
> > > property of pref. exclusive waiters in presence of the shared waiters.
> > > I think this is what you described above.
> > It also has some weird properties, like if there are multiple requestors
> > for the exclusive lock, once one thread gets it (the threads are nfsd worker
> > threads and indistinct), the others that requested an exclusive thread are
> > unblocked without the lock being issued to them.
> This sounds to me as LK_SLEEPFAIL feature of lockmgr.
> Do not underestimate the amount of weird features in it.
Yep, sounds like it. I should take a look to see if lockmgr will work
instead of the "rolled my own".

I should also take another look at new client creation, to see if there is a
way to do it that doesn't require the exclusive lock (a lot of that code is
20years old now).

rick

>
> > They then check if the exclusive lock is still needed (usually not, since
> > the other thread has dealt with the case where it was needed) and
> > then they can acquire a shared lock.
> > Without this, there were cases where several threads would acquire
> > the exclusive lock and then discover that the lock was not needed and
> > just release it again.
> >
> > It also uses an assortment of weird flags/call args.
> >
> > rick
> >



Re: 13-stable NFS server hang

2024-03-03 Thread Rick Macklem
On Sat, Mar 2, 2024 at 9:25 PM Garrett Wollman  wrote:
>
> <
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > and am watching to see if we have more freezes.
>
> In case anyone is wondering why this is an issue, it's the combination
> of two factors:
>
> 1) vn_generic_copy_file_range() attempts to preserve holes in the
> source file.
Just fyi, when I was first doing the copy_file_range(2) syscall, the discussion
seemed to think this was a reasonable thing to do.
It is now not so obvious for file systems doing compression, such as ZFS.

It happens that ZFS will no longer use vn_generic_copy_file_range() when
block cloning is enabled and I have no idea what block cloning does w.r.t.
preserving holes.

For non-compression file systems, comparing va_size with va_bytes should
serve as a reasonable hint w.r.t. the file being sparse. If the file
is not sparse,
vn_generic_copy_file_range() should not bother doing SEEK_DATA/SEEK_HOLE.
(I had intended to do such a patch, but I cannot now remember if I did do so.
I'll take a look.)
Note that this patch would not affect ZFS, but could improve UFS performaince
where vn_generic_copy_file_range() is used to do the copying.

rick

>
> 2) ZFS does automatic hole-punching on write for filesystems where
> compression is enabled.  It happens in the same code path as
> compression, checksum generation, and redundant-write suppression, and
> thus does not happen until the dirty blocks are about to be committed
> to disk.  So if the file is dirty, ZFS doesn't "know" whether thare
> where the then-extant holes are until a sync has completed.
>
> While vn_generic_copy_file_range() has a flag to stop and return
> partial success after a second of copying, this flag does not affect
> sleeps internal to the filesystem, so zfs_holey() can sleep
> indefinitely and vn_generic_copy_file_range() can't do anything about
> it until the sync has already happened.
>
> -GAWollman
>



Re: 13-stable NFS server hang

2024-03-03 Thread Rick Macklem
On Sat, Mar 2, 2024 at 8:28 PM Garrett Wollman  wrote:
>
>
> I wrote previously:
> > PIDTID COMMTDNAME  KSTACK
> > 997 108481 nfsdnfsd: mastermi_switch 
> > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> > fast_syscall_common
> > 997 960918 nfsdnfsd: service   mi_switch 
> > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > fork_trampoline
> > 997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range nfsrvd_dorpc 
> > nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline
>
> I spent some time this evening looking at this last stack trace, and
> stumbled across the following comment in
> sys/contrib/openzfs/module/zfs/dmu.c:
>
> | /*
> |  * Enable/disable forcing txg sync when dirty checking for holes with 
> lseek().
> |  * By default this is enabled to ensure accurate hole reporting, it can 
> result
> |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy 
> workloads.
> |  * Disabling this option will result in holes never being reported in dirty
> |  * files which is always safe.
> |  */
> | int zfs_dmu_offset_next_sync = 1;
>
> I believe this explains why vn_copy_file_range sometimes takes much
> longer than a second: our servers often have lots of data waiting to
> be written to disk, and if the file being copied was recently modified
> (and so is dirty), this might take several seconds.  I've set
> vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> and am watching to see if we have more freezes.
>
> If this does the trick, then I can delay deploying a new kernel until
> April, after my upcoming vacation.
Interesting. Please let us know how it goes.

And enjoy your vacation, rick

>
> -GAWollman
>



Re: 13-stable NFS server hang

2024-03-03 Thread Rick Macklem
On Sun, Mar 3, 2024 at 1:17 PM Rick Macklem  wrote:
>
> On Sat, Mar 2, 2024 at 8:28 PM Garrett Wollman  wrote:
> >
> >
> > I wrote previously:
> > > PIDTID COMMTDNAME  KSTACK
> > > 997 108481 nfsdnfsd: mastermi_switch 
> > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc amd64_syscall 
> > > fast_syscall_common
> > > 997 960918 nfsdnfsd: service   mi_switch 
> > > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > fork_trampoline
> > > 997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range 
> > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > fork_trampoline
> >
> > I spent some time this evening looking at this last stack trace, and
> > stumbled across the following comment in
> > sys/contrib/openzfs/module/zfs/dmu.c:
> >
> > | /*
> > |  * Enable/disable forcing txg sync when dirty checking for holes with 
> > lseek().
> > |  * By default this is enabled to ensure accurate hole reporting, it can 
> > result
> > |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy 
> > workloads.
> > |  * Disabling this option will result in holes never being reported in 
> > dirty
> > |  * files which is always safe.
> > |  */
> > | int zfs_dmu_offset_next_sync = 1;
> >
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > and am watching to see if we have more freezes.
> >
> > If this does the trick, then I can delay deploying a new kernel until
> > April, after my upcoming vacation.
> Interesting. Please let us know how it goes.
Btw, I just tried this for my trivial test and it worked very well.
A 1Gbyte file was cpied in two Copy RPCs of 1sec and slightly less than
1sec.

So, your vacation may be looking better, rick

>
> And enjoy your vacation, rick
>
> >
> > -GAWollman
> >



Re: 13-stable NFS server hang

2024-03-03 Thread Rick Macklem
On Sun, Mar 3, 2024 at 3:27 PM Rick Macklem  wrote:
>
> On Sun, Mar 3, 2024 at 1:17 PM Rick Macklem  wrote:
> >
> > On Sat, Mar 2, 2024 at 8:28 PM Garrett Wollman  
> > wrote:
> > >
> > >
> > > I wrote previously:
> > > > PIDTID COMMTDNAME  KSTACK
> > > > 997 108481 nfsdnfsd: mastermi_switch 
> > > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc 
> > > > amd64_syscall fast_syscall_common
> > > > 997 960918 nfsdnfsd: service   mi_switch 
> > > > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > > fork_trampoline
> > > > 997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > > > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > > > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > > > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range 
> > > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > > fork_trampoline
> > >
> > > I spent some time this evening looking at this last stack trace, and
> > > stumbled across the following comment in
> > > sys/contrib/openzfs/module/zfs/dmu.c:
> > >
> > > | /*
> > > |  * Enable/disable forcing txg sync when dirty checking for holes with 
> > > lseek().
> > > |  * By default this is enabled to ensure accurate hole reporting, it can 
> > > result
> > > |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy 
> > > workloads.
> > > |  * Disabling this option will result in holes never being reported in 
> > > dirty
> > > |  * files which is always safe.
> > > |  */
> > > | int zfs_dmu_offset_next_sync = 1;
> > >
> > > I believe this explains why vn_copy_file_range sometimes takes much
> > > longer than a second: our servers often have lots of data waiting to
> > > be written to disk, and if the file being copied was recently modified
> > > (and so is dirty), this might take several seconds.  I've set
> > > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > > and am watching to see if we have more freezes.
> > >
> > > If this does the trick, then I can delay deploying a new kernel until
> > > April, after my upcoming vacation.
> > Interesting. Please let us know how it goes.
> Btw, I just tried this for my trivial test and it worked very well.
> A 1Gbyte file was cpied in two Copy RPCs of 1sec and slightly less than
> 1sec.
Oops, I spoke too soon.
The Copy RPCs worked fine (as above) but the Commit RPCs took
a long time, so it still looks like you may need the patches.

rick

>
> So, your vacation may be looking better, rick
>
> >
> > And enjoy your vacation, rick
> >
> > >
> > > -GAWollman
> > >



Re: 13-stable NFS server hang

2024-03-03 Thread Rick Macklem
On Sun, Mar 3, 2024 at 4:28 PM Rick Macklem  wrote:
>
> On Sun, Mar 3, 2024 at 3:27 PM Rick Macklem  wrote:
> >
> > On Sun, Mar 3, 2024 at 1:17 PM Rick Macklem  wrote:
> > >
> > > On Sat, Mar 2, 2024 at 8:28 PM Garrett Wollman  
> > > wrote:
> > > >
> > > >
> > > > I wrote previously:
> > > > > PIDTID COMMTDNAME  KSTACK
> > > > > 997 108481 nfsdnfsd: mastermi_switch 
> > > > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc 
> > > > > amd64_syscall fast_syscall_common
> > > > > 997 960918 nfsdnfsd: service   mi_switch 
> > > > > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > > > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start 
> > > > > fork_exit fork_trampoline
> > > > > 997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > > > > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > > > > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > > > > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range 
> > > > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start 
> > > > > fork_exit fork_trampoline
> > > >
> > > > I spent some time this evening looking at this last stack trace, and
> > > > stumbled across the following comment in
> > > > sys/contrib/openzfs/module/zfs/dmu.c:
> > > >
> > > > | /*
> > > > |  * Enable/disable forcing txg sync when dirty checking for holes with 
> > > > lseek().
> > > > |  * By default this is enabled to ensure accurate hole reporting, it 
> > > > can result
> > > > |  * in a significant performance penalty for lseek(SEEK_HOLE) heavy 
> > > > workloads.
> > > > |  * Disabling this option will result in holes never being reported in 
> > > > dirty
> > > > |  * files which is always safe.
> > > > |  */
> > > > | int zfs_dmu_offset_next_sync = 1;
> > > >
> > > > I believe this explains why vn_copy_file_range sometimes takes much
> > > > longer than a second: our servers often have lots of data waiting to
> > > > be written to disk, and if the file being copied was recently modified
> > > > (and so is dirty), this might take several seconds.  I've set
> > > > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > > > and am watching to see if we have more freezes.
> > > >
> > > > If this does the trick, then I can delay deploying a new kernel until
> > > > April, after my upcoming vacation.
> > > Interesting. Please let us know how it goes.
> > Btw, I just tried this for my trivial test and it worked very well.
> > A 1Gbyte file was cpied in two Copy RPCs of 1sec and slightly less than
> > 1sec.
> Oops, I spoke too soon.
> The Copy RPCs worked fine (as above) but the Commit RPCs took
> a long time, so it still looks like you may need the patches.
And I should mention that my test is done on a laptop without a ZIL,
so maybe a ZIL on a separate device might generate different results.

rick
>
> rick
>
> >
> > So, your vacation may be looking better, rick
> >
> > >
> > > And enjoy your vacation, rick
> > >
> > > >
> > > > -GAWollman
> > > >



Re: 13-stable NFS server hang

2024-03-05 Thread Rick Macklem
On Tue, Mar 5, 2024 at 2:13 AM Ronald Klop  wrote:
>
>
> Van: Rick Macklem 
> Datum: vrijdag, 1 maart 2024 15:23
> Aan: Ronald Klop 
> CC: Garrett Wollman , stable@freebsd.org, 
> rmack...@freebsd.org
> Onderwerp: Re: 13-stable NFS server hang
>
> On Fri, Mar 1, 2024 at 12:00AM Ronald Klop  wrote:
> >
> > Interesting read.
> >
> >  Would it be possible to separate locking for admin actions like a client 
> > mounting an fs from traffic flowing for file operations?
> Well, the NFS server does not really have any concept of a mount.
> What I am referring to is the ClientID maintained for NFSv4 mounts,
> which all the open/lock/session/layout state hangs off of.
>
> For most cases, this state information can safely be accessed/modified
> via a mutex, but there are three exceptions:
> - creating a new ClientID (which is done by the ExchangeID operation)
>   and typically happens when a NFS client does a mount.
> - delegation Recall (which only happens when delegations are enabled)
>   One of the reasons delegations are not enabled by default on the
> FreeBSD server.
> - the DestroyClientID which is typically done by a NFS client during dismount.
> For these cases, it is just too difficult to do them without sleeping.
> As such, there is a sleep lock which the nfsd threads normally acquire shared
> when doing NFSv4 operations, but for the above cases the lock is aquired
> exclusive.
> - I had to give the exclusive lock priority over shared lock
> acquisition (it is a
>   custom locking mechanism with assorted weirdnesses) because without
>   that someone reported that new mounts took up to 1/2hr to occur.
>   (The exclusive locker waited for 30min before all the other nfsd threads
>were not busy.)
>   Because of this priority, once a nfsd thread requests the exclusive lock,
>   all other nfsd threads executing NFSv4 RPCs block after releasing their
>   shared lock, until the exclusive locker releases the exclusive lock.
>
> In summary, NFSv4 has certain advantages over NFSv3, but it comes
> with a lot of state complexity. It just is not feasible to manipulate all that
> state with only mutex locking.
>
> rick
>
> >
> > Like ongoing file operations could have a read only view/copy of the mount 
> > table. Only new operations will have to wait.
> > But the mount never needs to wait for ongoing operations before locking the 
> > structure.
> >
> > Just a thought in the morning
> >
> > Regards,
> > Ronald.
> >
> > Van: Rick Macklem 
> > Datum: 1 maart 2024 00:31
> > Aan: Garrett Wollman 
> > CC: stable@freebsd.org, rmack...@freebsd.org
> > Onderwerp: Re: 13-stable NFS server hang
> >
> > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:
> > >
> > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:
> > > >
> > > > Hi, all,
> > > >
> > > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > > Our NFS servers are running a 13-stable from last December, and
> > > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > > able to clearly see that there were periods when NFS activity would
> > > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > > for about 25 seconds before resuming exactly as it was before.
> > > >
> > > > I wrote a little awk script to watch for this happening and run
> > > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > > service threads were idle.  The three nfsd threads that had non-idle
> > > > kstacks were:
> > > >
> > > >   PIDTID COMMTDNAME  KSTACK
> > > >   997 108481 nfsdnfsd: mastermi_switch 
> > > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc 
> > > > amd64_syscall fast_syscall_common
> > > >   997 960918 nfsdnfsd: service   mi_switch 
> > > > sleepq_timedwait _sleep nfsv4_lock nfsrv_setclient nfsrvd_exchangeid 
> > > > nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit 
> > > > fork_trampoline
> > > >   997 962232 nfsdnfsd: service   mi_switch _cv_wait 
> > > > txg_wait_synced_impl txg_wait_synced dmu_offset_next zfs_holey 
> > > > zfs_freebsd_ioctl vn_generic_copy_file_range vop_stdcopy_file_range 
> > > > VOP_COPY_FILE_RANGE vn_copy_file_range nfsrvd_copy_file_range 
>

Re: 13-stable NFS server hang

2024-03-05 Thread Rick Macklem
On Tue, Mar 5, 2024 at 6:34 AM Rick Macklem  wrote:
>
> On Tue, Mar 5, 2024 at 2:13 AM Ronald Klop  wrote:
> >
> >
> > Van: Rick Macklem 
> > Datum: vrijdag, 1 maart 2024 15:23
> > Aan: Ronald Klop 
> > CC: Garrett Wollman , stable@freebsd.org, 
> > rmack...@freebsd.org
> > Onderwerp: Re: 13-stable NFS server hang
> >
> > On Fri, Mar 1, 2024 at 12:00AM Ronald Klop  wrote:
> > >
> > > Interesting read.
> > >
> > >  Would it be possible to separate locking for admin actions like a client 
> > > mounting an fs from traffic flowing for file operations?
> > Well, the NFS server does not really have any concept of a mount.
> > What I am referring to is the ClientID maintained for NFSv4 mounts,
> > which all the open/lock/session/layout state hangs off of.
> >
> > For most cases, this state information can safely be accessed/modified
> > via a mutex, but there are three exceptions:
> > - creating a new ClientID (which is done by the ExchangeID operation)
> >   and typically happens when a NFS client does a mount.
> > - delegation Recall (which only happens when delegations are enabled)
> >   One of the reasons delegations are not enabled by default on the
> > FreeBSD server.
> > - the DestroyClientID which is typically done by a NFS client during 
> > dismount.
> > For these cases, it is just too difficult to do them without sleeping.
> > As such, there is a sleep lock which the nfsd threads normally acquire 
> > shared
> > when doing NFSv4 operations, but for the above cases the lock is aquired
> > exclusive.
> > - I had to give the exclusive lock priority over shared lock
> > acquisition (it is a
> >   custom locking mechanism with assorted weirdnesses) because without
> >   that someone reported that new mounts took up to 1/2hr to occur.
> >   (The exclusive locker waited for 30min before all the other nfsd threads
> >were not busy.)
> >   Because of this priority, once a nfsd thread requests the exclusive lock,
> >   all other nfsd threads executing NFSv4 RPCs block after releasing their
> >   shared lock, until the exclusive locker releases the exclusive lock.
> >
> > In summary, NFSv4 has certain advantages over NFSv3, but it comes
> > with a lot of state complexity. It just is not feasible to manipulate all 
> > that
> > state with only mutex locking.
> >
> > rick
> >
> > >
> > > Like ongoing file operations could have a read only view/copy of the 
> > > mount table. Only new operations will have to wait.
> > > But the mount never needs to wait for ongoing operations before locking 
> > > the structure.
> > >
> > > Just a thought in the morning
> > >
> > > Regards,
> > > Ronald.
> > >
> > > Van: Rick Macklem 
> > > Datum: 1 maart 2024 00:31
> > > Aan: Garrett Wollman 
> > > CC: stable@freebsd.org, rmack...@freebsd.org
> > > Onderwerp: Re: 13-stable NFS server hang
> > >
> > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:
> > > >
> > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:
> > > > >
> > > > > Hi, all,
> > > > >
> > > > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > > > Our NFS servers are running a 13-stable from last December, and
> > > > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > > > able to clearly see that there were periods when NFS activity would
> > > > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > > > for about 25 seconds before resuming exactly as it was before.
> > > > >
> > > > > I wrote a little awk script to watch for this happening and run
> > > > > `procstat -k` on the nfsd process, and I saw that all but two of the
> > > > > service threads were idle.  The three nfsd threads that had non-idle
> > > > > kstacks were:
> > > > >
> > > > >   PIDTID COMMTDNAME  KSTACK
> > > > >   997 108481 nfsdnfsd: mastermi_switch 
> > > > > sleepq_timedwait _sleep nfsv4_lock nfsrvd_dorpc nfssvc_program 
> > > > > svc_run_internal svc_run nfsrvd_nfsd nfssvc_nfsd sys_nfssvc 
> > > > > amd64_syscall fast_syscall_common
> > > > >   997 960918 nfsdnfsd: service   mi_switch 
> > > > > sle

Re: 13-stable NFS server hang

2024-03-08 Thread Rick Macklem
On Wed, Mar 6, 2024 at 3:46 AM Ronald Klop  wrote:
>
>
> Van: Rick Macklem 
> Datum: dinsdag, 5 maart 2024 15:43
> Aan: Ronald Klop 
> CC: rmack...@freebsd.org, Garrett Wollman , 
> stable@freebsd.org
> Onderwerp: Re: 13-stable NFS server hang
>
> On Tue, Mar 5, 2024 at 6:34AM Rick Macklem  wrote:
> >
> > On Tue, Mar 5, 2024 at 2:13AM Ronald Klop  wrote:
> > >
> > >
> > > Van: Rick Macklem 
> > > Datum: vrijdag, 1 maart 2024 15:23
> > > Aan: Ronald Klop 
> > > CC: Garrett Wollman , stable@freebsd.org, 
> > > rmack...@freebsd.org
> > > Onderwerp: Re: 13-stable NFS server hang
> > >
> > > On Fri, Mar 1, 2024 at 12:00AM Ronald Klop  wrote:
> > > >
> > > > Interesting read.
> > > >
> > > >  Would it be possible to separate locking for admin actions like a 
> > > > client mounting an fs from traffic flowing for file operations?
> > > Well, the NFS server does not really have any concept of a mount.
> > > What I am referring to is the ClientID maintained for NFSv4 mounts,
> > > which all the open/lock/session/layout state hangs off of.
> > >
> > > For most cases, this state information can safely be accessed/modified
> > > via a mutex, but there are three exceptions:
> > > - creating a new ClientID (which is done by the ExchangeID operation)
> > >   and typically happens when a NFS client does a mount.
> > > - delegation Recall (which only happens when delegations are enabled)
> > >   One of the reasons delegations are not enabled by default on the
> > > FreeBSD server.
> > > - the DestroyClientID which is typically done by a NFS client during 
> > > dismount.
> > > For these cases, it is just too difficult to do them without sleeping.
> > > As such, there is a sleep lock which the nfsd threads normally acquire 
> > > shared
> > > when doing NFSv4 operations, but for the above cases the lock is aquired
> > > exclusive.
> > > - I had to give the exclusive lock priority over shared lock
> > > acquisition (it is a
> > >   custom locking mechanism with assorted weirdnesses) because without
> > >   that someone reported that new mounts took up to 1/2hr to occur.
> > >   (The exclusive locker waited for 30min before all the other nfsd threads
> > >were not busy.)
> > >   Because of this priority, once a nfsd thread requests the exclusive 
> > > lock,
> > >   all other nfsd threads executing NFSv4 RPCs block after releasing their
> > >   shared lock, until the exclusive locker releases the exclusive lock.
> > >
> > > In summary, NFSv4 has certain advantages over NFSv3, but it comes
> > > with a lot of state complexity. It just is not feasible to manipulate all 
> > > that
> > > state with only mutex locking.
> > >
> > > rick
> > >
> > > >
> > > > Like ongoing file operations could have a read only view/copy of the 
> > > > mount table. Only new operations will have to wait.
> > > > But the mount never needs to wait for ongoing operations before locking 
> > > > the structure.
> > > >
> > > > Just a thought in the morning
> > > >
> > > > Regards,
> > > > Ronald.
> > > >
> > > > Van: Rick Macklem 
> > > > Datum: 1 maart 2024 00:31
> > > > Aan: Garrett Wollman 
> > > > CC: stable@freebsd.org, rmack...@freebsd.org
> > > > Onderwerp: Re: 13-stable NFS server hang
> > > >
> > > > On Wed, Feb 28, 2024 at 4:04PM Rick Macklem wrote:
> > > > >
> > > > > On Tue, Feb 27, 2024 at 9:30PM Garrett Wollman wrote:
> > > > > >
> > > > > > Hi, all,
> > > > > >
> > > > > > We've had some complaints of NFS hanging at unpredictable intervals.
> > > > > > Our NFS servers are running a 13-stable from last December, and
> > > > > > tonight I sat in front of the monitor watching `nfsstat -dW`.  I was
> > > > > > able to clearly see that there were periods when NFS activity would
> > > > > > drop *instantly* from 30,000 ops/s to flat zero, which would last
> > > > > > for about 25 seconds before resuming exactly as it was before.
> > > > > >
> > > > > > I wrote a little awk script to watch for this happening and run
> > > > > > `procstat -k` on the nfsd process, a

Re: 13-stable NFS server hang

2024-03-08 Thread Rick Macklem
On Thu, Mar 7, 2024 at 7:59 PM Garrett Wollman  wrote:
>
> <
> > I believe this explains why vn_copy_file_range sometimes takes much
> > longer than a second: our servers often have lots of data waiting to
> > be written to disk, and if the file being copied was recently modified
> > (and so is dirty), this might take several seconds.  I've set
> > vfs.zfs.dmu_offset_next_sync=0 on the server that was hurting the most
> > and am watching to see if we have more freezes.
>
> > If this does the trick, then I can delay deploying a new kernel until
> > April, after my upcoming vacation.
>
> Since zeroing dmu_offset_next_sync, I've seen about 8000 copy
> operations on the problematic server and no NFS work stoppages due to
> the copy.  I have observed a few others in a similar posture, where
> one client wants to ExchangeID and is waiting for other requests to
> drain, but nothing long enough to cause a service problem.[1]
>
> I think in general this choice to prefer "accurate" but very slow hole
> detection is a poor choice on the part of the OpenZFS developers, but
> so long as we can disable it, I don't think we need to change anything
> in the NFS server itself.
So the question is...
How can this be documented?
In the BUGS section of "man nfsd" maybe.
What do others think?

>  It would be a good idea longer term to
> figure out a lock-free or synchronization-free way of handling these
> client session accept/teardown operations, because it is still a
> performance degradation, just not disruptive enough for users to
> notice.
Yes, as I've noted, it is on my todo list to take a look at it.

Good sleuthing, rick

>
> -GAWollman
>
> [1] Saw one with a slow nfsrv_readdirplus and another with a bunch of
> threads blocked on an upcall to nfsuserd.



Warning: do not enable NFSv4 delegations for a 13.3 NFS server

2024-05-07 Thread Rick Macklem
Hi,

I boned up and, when wireshark reported that a NFSv4
packet was incorrectly constructed, I changed the NFSv4
code the "fix" the problem.

I found out recently (at a IETF NFSv4 testing event) that
wireshark was buggy and the code was actually broken
by the "fix".

I have now corrected this in main, stable/14 and stable/13.
(Commit 54c3aa02e926 in main.)
However, FreeBSD 13.3 shipped with this broken.

The bug only affects the NFSv4 server if delegations are
enabled, which is not the default.
As such, so long as "vfs.nfsd.issue_delegations == 0".
you should be fine.

rick
ps: Since few enable delegations, I do not feel this
  needs an errata for FreeBSD 13.3.
.



Re: new error messages after upgrade 14.0-p5 to 14.1-p1 amd64

2024-06-27 Thread Rick Macklem
On Thu, Jun 27, 2024 at 7:52 AM void  wrote:
>
> After upgrading, I noticed messages like these appearing during
> rebooting, immediately after kernel: NFS access cache time=60
>
> kernel: rpc.umntall: 10.1.1.102: MOUNTPROG: RPC: Port mapper failure - RPC: 
> Timed out
>
> sometimes it appears twice, once it appeared five times. The system
> eventually completes booting and there are no more messages like this.
It indicates that mount_nfs is having trouble talking to rpcbind, but I do not
know why that would happen?

Do you have either of these lines in your /etc/rc.conf?
(Having either one of them should be sufficient to ensure rpcbind
is started before the mount is attempted.)

rpcbind_enable="YES"
nfs_client_enable="YES"

If you have at least one of these in your /etc/rc.conf, then all I can think
of is some sort of network/routing issue that interferes with rpcbind working.
You can also look at the output of:
# rpcinfo
once it is booted, to see that rpcbind seems correctly configured.
It should be attached to tcp, tcp6, udp, udp6 and /var/run/rpcbind.sock.

rick

>
> /etc/fstab has the following
>
> 10.1.1.102:/usr/src /usr/src nfs rw,readahead=16,late 0 0
>
> Logging in, the nfs share is mounted normally.
> --
>



Re: new error messages after upgrade 14.0-p5 to 14.1-p1 amd64

2024-06-30 Thread Rick Macklem
On Fri, Jun 28, 2024 at 5:23 AM void  wrote:
>
> Hi Rick,
>
> On Thu, Jun 27, 2024 at 09:29:51AM -0700, Rick Macklem wrote:
>
> >rpcbind_enable="YES"
> >nfs_client_enable="YES"
> >
> >If you have at least one of these in your /etc/rc.conf, then all I can think
> >of is some sort of network/routing issue that interferes with rpcbind 
> >working.
>
> I have nfs_client_enable=YES only. Should I have both?
Should not be necessary. I played around a bit with a 14.1 vm and was not able
to reproduce your problem with/without rpcbind running on the system.

>
> >You can also look at the output of:
> ># rpcinfo
> >once it is booted, to see that rpcbind seems correctly configured.
> >It should be attached to tcp, tcp6, udp, udp6 and /var/run/rpcbind.sock.
>
> % doas rpcinfo
> rpcinfo: can't contact rpcbind: RPC: Port mapper failure - RPC: Success
If you are not running rpcbind, this is normal and should not affect
rpc.umntall.
(It works for me without rpcbind running.)

>
> It's odd that beforehand, there was no error notification.
>
> The bootup of the client will stall with the aforementioned error,
> i presume it's carrying on once the nfs server is contacted. So far, I've not
> had to ctl-C or ctl-D to get past it. Once booted up, the nfs shares are
> available and writable.
It doesn't affect the actual mount. An NFSv3 mount is "stateless", which means
the NFS server doesn't even know it exists and does not need to know.

Sun came up with junk that tries to indicate what is NFS mounted using the
Mount protocol (think mountd on the server). The rpc.umntall command is
executed by /etc/rc.d/nfsclient and all it does is tell the server to cleanup
this mount junk (only used by things like "showmount" and not NFS itself).

Bottom line..it is just al little irritating and might slow down booting a bit.

Why is it happening? Don't really know, but it indicates that the client
cannot talk to the NFS server at the point in booting when /etc/rc.d/nfsclient
is executed.
You can:
# cd /etc; rcorder rc.conf rc.d/*
and the output shows you what order the scripts in /etc/rc.d get executed.
Since "nfsclient" is after NETWORKING, it should work?

>
> The nfs server is 14-stable and the exported dir is zfs with the sharenfs 
> property set.
> The client (14.1-p1) is a VM on a different machine but the same network.
>
> I'll try putting rpcbind_enable=YES in /etc/rc.conf
Probably won't make any difference.

rick

> --
>



Re: new error messages after upgrade 14.0-p5 to 14.1-p1 amd64

2024-07-02 Thread Rick Macklem
On Sat, Jun 29, 2024 at 3:18 AM void  wrote:
>
> On Fri, Jun 28, 2024 at 01:23:29PM +0100, void wrote:
>
> (snip)
>
> Just to add to this, the reported error does *not* occur on
> 15.0-CURRENT #5 n270917-5dbf886104b4 arm64
>
> and that system has *only* nfs_client_enable="YES"
>
> and it's a slower system
Could this possibly be the cause?

 https://lists.freebsd.org/archives/freebsd-current/2024-June/006075.html

rick

> --
>



Re: new error messages after upgrade 14.0-p5 to 14.1-p1 amd64

2024-07-11 Thread Rick Macklem
On Thu, Jul 11, 2024 at 4:47 AM void  wrote:
>
> Hi Rick,
>
>
> >>
> >> The nfs server is 14-stable and the exported dir is zfs with the sharenfs 
> >> property set.
> >> The client (14.1-p1) is a VM on a different machine but the same network.
> >>
> >> I'll try putting rpcbind_enable=YES in /etc/rc.conf
> >Probably won't make any difference.
> >
>
> Sorry for hte slow response - this vm is rarely rebooted apart from
> updates. And so it was rebooted to apply 14.1-p1 => 14.1-p2 via 
> freebsd-update:
>
> Updating /var/run/os-release done.
> Clearing /tmp.
> Starting syslogd.
> No core dumps found.
> Starting rpcbind.
> NFS access cache time=60
> rpc.umntall: 10.0.1.102: MOUNTPROG: RPC: Port mapper failure - RPC: Timed out
> rpc.umntall: 10.0.1.102: MOUNTPROG: RPC: Port mapper failure - RPC: Timed out
> Mounting late filesystems:.
> Security policy loaded: MAC/ntpd (mac_ntpd)
> Starting ntpd.
>
> It's really odd.
Did you look at the bug I pointed at in my previous email?
https://lists.freebsd.org/archives/freebsd-current/2024-June/006075.html

It basically says that 14.1 is buggy if you do not specify a netmask or width
for ifconfig for IP4 addresses. The bug is fixed in stable/14, but
exists in 14.1.
So, make sure that there is a netmask or width on the IP4 address for all your
ifconfig lines (probably in /etc/rc.conf).

rick

> --
>



Re: Possible bug in zfs send or pipe implementation?

2024-07-13 Thread Rick Macklem
On Sat, Jul 13, 2024 at 7:02 PM Garrett Wollman  wrote:
>
> I'm migrating an old file server to new hardware using syncoid.  Every
> so often, the `zfs send` process gets stuck with the following
> kstacks:
>
>  7960 108449 zfs -   mi_switch 
> sleepq_catch_signals sleepq_wait_sig _sleep pipe_write zfs_file_write_impl 
> zfs_file_write dump_record dmu_dump_write do_dump dmu_send_impl dmu_send_obj 
> zfs_ioc_send zfsdev_ioctl_common zfsdev_ioctl devfs_ioctl vn_ioctl 
> devfs_ioctl_f
>  7960 126072 zfs send_traverse_threa mi_switch 
> sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl send_cb 
> traverse_visitbp traverse_visitbp traverse_visitbp traverse_dnode 
> traverse_visitbp traverse_visitbp traverse_visitbp traverse_visitbp 
> traverse_visitbp traverse_visitbp traverse_dnode traverse_visitbp
>  7960 126074 zfs send_merge_thread   mi_switch 
> sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl 
> send_merge_thread fork_exit fork_trampoline
>  7960 126075 zfs send_reader_thread  mi_switch 
> sleepq_catch_signals sleepq_wait_sig _cv_wait_sig bqueue_enqueue_impl 
> send_reader_thread fork_exit fork_trampoline
>
> Near as I can tell, the thread first thread is trying to write
> serialized data data to the output pipe and is blocked.  The other
> threads are stuck because the write process isn't making progress.
# ps axHl
should show you what wchan's the processes are waiting on and that might
give you a clue w.r.t. what is happening?

If is easy to build a kernel from sources and boot that, you could try defining
PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs?

rick

>
> The process reading from the pipe (which is just a progress meter) is
> sitting in select() waiting for the pipe to become ready, so either
> zfs_file_write() is doing something wrong, or the pipe implementation
> has lost a selwakeup() somewhere.  (Or, possibly but unlikely, the
> progress meter has lost the read end of the pipe from its read
> fd_set.)  Unfortunately, neither fstat nor procstat print any useful
> information about the state of the pipe, so I can only try to deduce
> what's going on from the observable behavior.
>
> -GAWollman
>



Re: Possible bug in zfs send or pipe implementation?

2024-07-13 Thread Rick Macklem
On Sat, Jul 13, 2024 at 8:19 PM Garrett Wollman  wrote:
>
> < 
> said:
>
> > # ps axHl
> > should show you what wchan's the processes are waiting on and that might
> > give you a clue w.r.t. what is happening?
>
> zfs is waiting to write into the pipe and pv (the progress meter) is
> waiting in select.
Just to clarify it, are you saying zfs is sleeping on "pipewr"?
(There is also a msleep() for "pipbww" in pipe_write().)

rick

>
> > If is easy to build a kernel from sources and boot that, you could try 
> > defining
> > PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs?
>
> It's easy to build a kernel from sources, but not easy to reboot the
> server -- it's being retired shortly, and because of time constraints
> I need to get it drained before the next scheduled outage.
>
> -GAWollman
>



Re: Possible bug in zfs send or pipe implementation?

2024-07-13 Thread Rick Macklem
On Sat, Jul 13, 2024 at 8:50 PM Rick Macklem  wrote:
>
> On Sat, Jul 13, 2024 at 8:19 PM Garrett Wollman  
> wrote:
> >
> > < 
> > said:
> >
> > > # ps axHl
> > > should show you what wchan's the processes are waiting on and that might
> > > give you a clue w.r.t. what is happening?
> >
> > zfs is waiting to write into the pipe and pv (the progress meter) is
> > waiting in select.
> Just to clarify it, are you saying zfs is sleeping on "pipewr"?
If I am reading the code correctly, if it sleeping on "pipewr", it is
out of space
and that is controlled via:
kern.ipc.maxpipekva
and you can see what it is using by looking at
kern.ipc.pipekva
(Unfortunately, I don't think you can change kern.ipc.maxpipekva on the fly.
It looks like it is a loader tunable, so you'd need to reboot to make
it larger.)

Anyhow, you can take a look at the sysctls. They might help?

There is quite a detailed comment in sys/kern/sys_pipe.c related to this.

rick

> (There is also a msleep() for "pipbww" in pipe_write().)
>
> rick
>
> >
> > > If is easy to build a kernel from sources and boot that, you could try 
> > > defining
> > > PIPE_NODIRECT in sys/kern/sys_pipe.c and see if that avoids the hangs?
> >
> > It's easy to build a kernel from sources, but not easy to reboot the
> > server -- it's being retired shortly, and because of time constraints
> > I need to get it drained before the next scheduled outage.
> >
> > -GAWollman
> >



Re: Possible bug in zfs send or pipe implementation?

2024-07-14 Thread Rick Macklem
On Sun, Jul 14, 2024 at 10:32 AM Garrett Wollman 
wrote:

> <
> said:
>
> > Just to clarify it, are you saying zfs is sleeping on "pipewr"?
> > (There is also a msleep() for "pipbww" in pipe_write().)
>
> It is sleeping on pipewr, yes.
>
> [wollman@nfs-prod-11 ~]$ sysctl kern.ipc.pipekva
> kern.ipc.pipekva: 536576
> [wollman@nfs-prod-11 ~]$ sysctl kern.ipc.maxpipekva
> kern.ipc.maxpipekva: 2144993280
>
> It's not out of KVA, it's just waiting for the `pv` process to wake up
> and read more data.  `pv` is single-threaded and blocked on "select".
>
> It doesn't always get stuck in the same place, which is why I'm
> suspecting a lost wakeup somewhere.
>
This snippet from sys/kern/sys_pipe.c looks a little suspicious to me...
  /*
* Direct copy, bypassing a kernel buffer.
*/
} else if ((size = rpipe->pipe_pages.cnt) != 0) {
if (size > uio->uio_resid)
size = (u_int) uio->uio_resid;
PIPE_UNLOCK(rpipe);
error = uiomove_fromphys(rpipe->pipe_pages.ms,
rpipe->pipe_pages.pos, size, uio);
PIPE_LOCK(rpipe);
if (error)
break;
nread += size;
rpipe->pipe_pages.pos += size;
rpipe->pipe_pages.cnt -= size;
if (rpipe->pipe_pages.cnt == 0) {
rpipe->pipe_state &= ~PIPE_WANTW;
wakeup(rpipe);
}
If it reads uio_resid bytes which is less than pipe_pages.cnt, no
wakeup() occurs.
I'd be tempted to try getting rid of the "if (rpipe->pipe_pages.cnt == 0)"
and do the wakeup() unconditionally, to see if it helps?

Because if the application ("pv" in this case) doesn't do another read() on
the
pipe before calling select(), no wakeup() is going to occur, because here's
what pipe_write() does...
/*
* We have no more space and have something to offer,
* wake up select/poll.
*/
pipeselwakeup(wpipe);

wpipe->pipe_state |= PIPE_WANTW;
pipeunlock(wpipe);
error = msleep(wpipe, PIPE_MTX(rpipe),
PRIBIO | PCATCH, "pipewr", 0);
pipelock(wpipe, 0);
if (error != 0)
break;
continue;
Note that, once in msleep(), no call to pipeselwakeup() will occur until
it gets woken up.

I think the current code assumes that the reader ("pv" in this case) will
read all the data out of the pipe before calling select() again.
Does it do that?

rick
ps: I've added markj@ as a cc, since he seems to have been the last guy
involved
in sys_pipe.c.



> -GAWollman
>
>


NFS server credentials with cr_ngroups == 0

2024-10-14 Thread Rick Macklem
olce@ reported an issue where the credentials used for mapped
user exports (for the NFS server) could have cr_ngroups == 0.
At first I thought this was a mountd bug, but he pointed out the
exports(5) manpage, which says:

Note that user: should be used to distinguish a credential containing
no groups from a complete credential for that user.
The group names may be quoted, or use backslash escaping.

As such, this is not just an allowed case, but a documented one.
(This snippet from exports(5) goes all the way back to May 1994
when the man page was imported from 4.4BSD Lite.)

Note that these credentials are not POSIX syscall ones.
They are used specifically by the NFS server for file access.
The good news is that the current main sources appear to
always funnel down into groupmember() to check this.

The not so good news is that commit 7f92e57 (Jun 20, 2009)
broke groupmember() for the case where cr_ngroups == 0,
assuming there would always be at least one group (cr_groups[0]
or cr_gid, if you prefer).

So, what should we do about this?

#1 A simple patch can be applied to groupmember() and a couple
 of places in the NFS server code, so that cr_ngroups == 0 again
 works correctly for this case.
#2 Decide that cr_ngroups == 0 should no longer be supported and
 patch accordingly.
OR ???

Personally, I am thinking that #1 should be done right away and
MFC'd to stable/14 and stable/13 so that the currently documented
behaviour is supported for FreeBSD 13 and FreeBSD14.
(To do otherwise would seem to be a POLA violation to me.)

Then, the FreeBSD community needs to decide if #2 should be done
(or document that the cr_ngroups == 0 case needs to work correctly).

Please respond with your opinion w.r.t. how to handle this.

Note that if a file with these pemissions:
rw-r- 1 root games 409 Dec 30 2023 foo
were exported to a client with the following exports(5) line:
/home -sec=sys -maproot=1001:

Then, "root" on an NFS mounted client tried to read the file,
the attempt should fail (assuming root in not a member of games).
However, if cr_groups[0] just happened to have 13 in it (it is random
junk when cr_ngroups == 0), the read would succeed.
--> This vulnerability can be avoided by never using the syntax
  "=:" for -maproot or -mapall in /etc/exports.
Should so@ do some sort of announcement w.r.t. this?

Thanks in advance for your comments, rick
ps: Yes, I cross posted, since I wanted both developers and users
  to see this.



Re: NFSd not registering on 14.2.

2024-11-24 Thread Rick Macklem
On Thu, Nov 21, 2024 at 9:16 PM Zaphod Beeblebrox  wrote:
>
> lo0 has 127.0.0.1, ::1 (both first in their lists).  It also has a pile of 
> other IPs that are used by jails.  This has not changed
I just did a trivial setup with the most recent snapshot for 14.2 and
it worked ok.

So, I have no idea what your problem is, but I'd start by looking at
your network setup.

Maybe reposting to freebsd-net@ might help. Make sure you mention that not
registering to rpcbind is the problem in the subject line. (If you
just mention "registering nfsd"
most are liable to ignore the email, from what I've seen.)

Good luck with it, rick

>
> On Thu, Nov 21, 2024 at 6:35 PM Rick Macklem  wrote:
>>
>> On Thu, Nov 21, 2024 at 1:22 PM Zaphod Beeblebrox  wrote:
>> >
>> >
>> > I've tried a lot of different combinations of rc variables.  On 13.3 and 
>> > 14.1, nfsd in most (non-v4-only) configurations registers to rpcbind as 
>> > expected.  This is true of restarting nfsd and using nfsd -r.
>> >
>> > However on 14.2, I can't contrive any configuration that registers to 
>> > rpcbind.  Minimally, on one fairly quiet 14.1 server, I simply have
>> >
>> > nfs_server_enable="YES"
>> > mountd_enable="YES"
>> > mountd_flags="-h  -S"
>> >
>> > on another, I have more:
>> >
>> > mountd_enable="YES"
>> > nfs_client_enable="YES"
>> > nfs_server_enable="YES"
>> > nfsv4_server_enable="NO"
>> > #nfs_server_flags="-u -t -n 12" # Flags to nfsd (if enabled).
>> > nfsuserd_enable="YES"
>> > nfsuserd_flags="-domain daveg.ca"
>> > nfscbd_enable="YES"
>> > rpc_lockd_enable="YES"
>> > rpc_statd_enable="YES"
>> >
>> > readup for what the 14.2 server has --- but I've tried configurations 
>> > going from the former to the latter.  None of them register.
>> >
>> All I can suggest is checking lo0 to make sure it is using 127.0.0.1.
>> See what
>> # ifconfig -a
>> shows.
>>
>> If lo0 is not 127.0.0.1, that would explain it, since the rpcbind stuff uses
>> 127.0.0.1.
>>
>> Note that 127.0.0.1 gets added automatically when "-h" is used.
>>
>> Btw, I don't think I changed anything w.r.t. this between 14.1 and 14.2,
>> so it is likely some other network related change.
>>
>> rick



Re: 14.1 NFS / mountd : -alldirs not working as expected

2024-11-21 Thread Rick Macklem
On Wed, Nov 20, 2024 at 8:01 PM Michael Proto  wrote:
>
> Hello all,
>
> Running into an issue with a 14.1 server that I think is a bug, though
> it may be me not interpreting documentation correctly so I wanted to
> ask here.
=alldirs simply means that any directory within the server file system
can be mounted. So, yes, everything up to the root dir can be mounted.

Normally, the directory for such an exports line would be the root directory
of the file system, but I doubt mountd actually enforces that, since the export
line is for "all directories" in the file system.

>
> Using NFSv3, with FreeBSD 14.1 as the NFS server. Based on what I see
> in exports(5), if I want to export conditional mounts (IE filesystem
> paths that are intermittently mounted locally on server)
No idea what you mean by "intermittently mounted locally"?
(An export will be for whatever file system is mounted for the directory
at the time mountd is started or updates exports when a SIGHUP is
sent to it.)

Exporting a file system that is not always mounted on the server is
a very bad idea imho. It would be much better to add the exports(5)
line after the file system is mounted and remove it before the file
system is unmounted, if you need to export a file system not always
mounted.

rick

 I should use
> -alldirs and specify the mount-point as the export. Per the manpage,
> this export should only be accessible when the exported directory is
> actually the root of a mounted filesystem. Currently if mountd is
> HUPed while the export isn't a filesystem mount I get the warning
> about exporting the filesystem "below" the export (root-FS in this
> case) and I can actually mount the root-FS from the client, instead of
> getting an error as I would expect. Using the specific example for a
> sometimes-mounted /cdrom in exports(5) can demonstrate this behavior.
>
>   /etc/rc.conf :
> nfs_server_enable="YES"
> rpcbind_enable="YES"
> rpc_statd_enable="YES"
> rpc_lockd_enable="YES"
> mountd_enable="YES"
>
>   /etc/exports :
> /cdrom -alldirs,quiet,ro -network=10.0.0.0/24
>
> (at this time /cdrom exists as a directory but is not currently a
> filesystem mount point)
> on the server:
> root@zfstest1:~ # killall -HUP mountd
>
>   /var/log/messages:
> Nov 20 22:34:56 zfstest1 mountd[27724]: Warning: exporting /cdrom
> exports entire / file system
>
> root@zfstest1:~ # showmount -e
> Exports list on localhost:
> /cdrom 10.0.0.0
>
>
> on a client, I can now mount "/" from my server zfstest1:
>
> root@client1:~ # mount -r -t nfs zfstest1:/ /mnt
> root@client1:~ # mount | tail -n1
> zfstest1:/ on /mnt (nfs, read-only)
>
> The root-FS of zfstest1 is indeed visible in /mnt on client1
>
> From what I see in /usr/src/usr.sbin/mountd/mountd.c this isn't
> supposed to happen (I'm no C programmer but this did read something
> like I should receive an export error from mountd when I send a HUP):
> ...
> } else if (!strcmp(cpopt, "alldirs")) {
> opt_flags |= OP_ALLDIRS;
> ...
> if (opt_flags & OP_ALLDIRS) {
> if (errno == EINVAL)
> syslog(LOG_ERR,
> "-alldirs requested but %s is not a filesystem mountpoint",
> dirp);
> else
> syslog(LOG_ERR,
> "could not remount %s: 
> %m",
> dirp);
> ret = 1;
> goto error_exit;
> }
>
> I suspect this code path isn't being hit since I'm getting the mountd
> warning I referenced above instead of this error. This appears to be a
> possible recurrence of a very old bug that depicts similar behavior :
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170413
> While it appears the "-sec" issue referenced in that bug is fixed in
> the listed PRs I didn't see anything on this -alldirs issue that's
> also mentioned there, maybe that's why I'm running into this now?
>
> I'd be totally unsurprised if my /etc/exports file isn't configured
> correctly, but I reduced my setup to just the example in the exports
> man page and I'm struggling to determine how to interpret that
> information differently. I also tried an export of /cdrom with only
> "-alldirs" as an option and I get the same behavior. Ideas?
>
>
> Thanks,
> Michael Proto
>



Re: 14.1 NFS / mountd : -alldirs not working as expected

2024-11-26 Thread Rick Macklem
On Mon, Nov 25, 2024 at 3:57 PM Rick Macklem  wrote:
>
> On Mon, Nov 25, 2024 at 2:55 PM Rick Macklem  wrote:
> >
> > On Wed, Nov 20, 2024 at 8:01 PM Michael Proto  wrote:
> > >
> > > Hello all,
> > >
> > > Running into an issue with a 14.1 server that I think is a bug, though
> > > it may be me not interpreting documentation correctly so I wanted to
> > > ask here.
> > >
> > > Using NFSv3, with FreeBSD 14.1 as the NFS server. Based on what I see
> > > in exports(5), if I want to export conditional mounts (IE filesystem
> > > paths that are intermittently mounted locally on server) I should use
> > > -alldirs and specify the mount-point as the export. Per the manpage,
> > > this export should only be accessible when the exported directory is
> > > actually the root of a mounted filesystem. Currently if mountd is
> > > HUPed while the export isn't a filesystem mount I get the warning
> > > about exporting the filesystem "below" the export (root-FS in this
> > > case) and I can actually mount the root-FS from the client, instead of
> > > getting an error as I would expect. Using the specific example for a
> > > sometimes-mounted /cdrom in exports(5) can demonstrate this behavior.
> Just fyi, I also plan on coming up with a patch for exports(5) to make the
> correct semantics of -alldirs clearer. It only explains that -alldirs
> is only supposed
> to work on mount points in the examples section. It took me a couple of
> passes through it before I spotted it and realized this is a bug.
I dug into the git repository and, believe it or not, it looks like this was
broken between releng1.0 and releng2.0 (there doesn't seem to be an
exact commit).

Basically, for releng1.0 the path provided by the exports line was passed
into mount(2), which would fail if the path was not a mount point.
This was how "not at a mount point" was detected for -alldirs.

For releng2.0, it passes f_mntonname to mount(2), which is the
mount point. This broke the check for "is a mount point".
To be honest, the while() loop calling nmount(2) is mostly
(if not entirely useless), because its purpose was to climb the path
to the mount point and this should never now happen.

I do have a patch that detects "not a mount point" using a strcmp()
between f_mntoname and the path in the exports line.
That should be sufficient, since symbolic links should not be in
the path in exports(5).

Michael, once you create a bugzilla bug report (bugs.freebsd.org),
I will attach the patch and work on getting it committed.

rick

>
> rick
>
> > >
> > >   /etc/rc.conf :
> > > nfs_server_enable="YES"
> > > rpcbind_enable="YES"
> > > rpc_statd_enable="YES"
> > > rpc_lockd_enable="YES"
> > > mountd_enable="YES"
> > >
> > >   /etc/exports :
> > > /cdrom -alldirs,quiet,ro -network=10.0.0.0/24
> > >
> > > (at this time /cdrom exists as a directory but is not currently a
> > > filesystem mount point)
> > > on the server:
> > > root@zfstest1:~ # killall -HUP mountd
> > >
> > >   /var/log/messages:
> > > Nov 20 22:34:56 zfstest1 mountd[27724]: Warning: exporting /cdrom
> > > exports entire / file system
> > I took a closer look and this is a bug. It appears that -alldirs is supposed
> > to fail when a non-mountpoint is exported.
> >
> > It appears to have been introduced to the system long ago, although I
> > haven't yet tracked down the commit.
> >
> > mountd.c assumes that nmount(8) will fail when the directory path
> > is not a mount point, however for MNT_UPDATE (which is what is
> > used to export file systems) this is not the case.
> >
> > Please create a bugzilla bug report for this and I will work on a patch.
> >
> > Btw, quiet is also broken in the sense that it will cause any nmount(8)
> > failure to fail. However, since nmount(8) does not fail for this case,
> > it hardly matters. I will come up with a patch for this too, since it is
> > easy to fix.
> >
> > Thanks for reporting this, rick
> >
> > >
> > > root@zfstest1:~ # showmount -e
> > > Exports list on localhost:
> > > /cdrom 10.0.0.0
> > >
> > >
> > > on a client, I can now mount "/" from my server zfstest1:
> > >
> > > root@client1:~ # mount -r -t nfs zfstest1:/ /mnt
> > > root@client1:~ # mount | tail -n1
> > > zfstest1:/ on /mnt (nfs, read-only)
> > >
> >

Re: NFSd not registering on 14.2.

2024-11-25 Thread Rick Macklem
On Sun, Nov 24, 2024 at 2:17 PM Rick Macklem  wrote:
>
> On Thu, Nov 21, 2024 at 9:16 PM Zaphod Beeblebrox  wrote:
> >
> > lo0 has 127.0.0.1, ::1 (both first in their lists).  It also has a pile of 
> > other IPs that are used by jails.  This has not changed
> I just did a trivial setup with the most recent snapshot for 14.2 and
> it worked ok.
A couple more things you might want to check...
See what the netmask is for lo0.
See if these are set and try changing their settings...
net.inet.ip.connect_inaddr_wild
net.inet6.ip6.connect_in6addr_wild

Beyond that, try it with/without
rpcbind_enable="YES"
in /etc/rc.conf.

rick

>
> So, I have no idea what your problem is, but I'd start by looking at
> your network setup.
>
> Maybe reposting to freebsd-net@ might help. Make sure you mention that not
> registering to rpcbind is the problem in the subject line. (If you
> just mention "registering nfsd"
> most are liable to ignore the email, from what I've seen.)
>
> Good luck with it, rick
>
> >
> > On Thu, Nov 21, 2024 at 6:35 PM Rick Macklem  wrote:
> >>
> >> On Thu, Nov 21, 2024 at 1:22 PM Zaphod Beeblebrox  
> >> wrote:
> >> >
> >> >
> >> > I've tried a lot of different combinations of rc variables.  On 13.3 and 
> >> > 14.1, nfsd in most (non-v4-only) configurations registers to rpcbind as 
> >> > expected.  This is true of restarting nfsd and using nfsd -r.
> >> >
> >> > However on 14.2, I can't contrive any configuration that registers to 
> >> > rpcbind.  Minimally, on one fairly quiet 14.1 server, I simply have
> >> >
> >> > nfs_server_enable="YES"
> >> > mountd_enable="YES"
> >> > mountd_flags="-h  -S"
> >> >
> >> > on another, I have more:
> >> >
> >> > mountd_enable="YES"
> >> > nfs_client_enable="YES"
> >> > nfs_server_enable="YES"
> >> > nfsv4_server_enable="NO"
> >> > #nfs_server_flags="-u -t -n 12" # Flags to nfsd (if enabled).
> >> > nfsuserd_enable="YES"
> >> > nfsuserd_flags="-domain daveg.ca"
> >> > nfscbd_enable="YES"
> >> > rpc_lockd_enable="YES"
> >> > rpc_statd_enable="YES"
> >> >
> >> > readup for what the 14.2 server has --- but I've tried configurations 
> >> > going from the former to the latter.  None of them register.
> >> >
> >> All I can suggest is checking lo0 to make sure it is using 127.0.0.1.
> >> See what
> >> # ifconfig -a
> >> shows.
> >>
> >> If lo0 is not 127.0.0.1, that would explain it, since the rpcbind stuff 
> >> uses
> >> 127.0.0.1.
> >>
> >> Note that 127.0.0.1 gets added automatically when "-h" is used.
> >>
> >> Btw, I don't think I changed anything w.r.t. this between 14.1 and 14.2,
> >> so it is likely some other network related change.
> >>
> >> rick



Re: 14.1 NFS / mountd : -alldirs not working as expected

2024-11-25 Thread Rick Macklem
On Mon, Nov 25, 2024 at 2:55 PM Rick Macklem  wrote:
>
> On Wed, Nov 20, 2024 at 8:01 PM Michael Proto  wrote:
> >
> > Hello all,
> >
> > Running into an issue with a 14.1 server that I think is a bug, though
> > it may be me not interpreting documentation correctly so I wanted to
> > ask here.
> >
> > Using NFSv3, with FreeBSD 14.1 as the NFS server. Based on what I see
> > in exports(5), if I want to export conditional mounts (IE filesystem
> > paths that are intermittently mounted locally on server) I should use
> > -alldirs and specify the mount-point as the export. Per the manpage,
> > this export should only be accessible when the exported directory is
> > actually the root of a mounted filesystem. Currently if mountd is
> > HUPed while the export isn't a filesystem mount I get the warning
> > about exporting the filesystem "below" the export (root-FS in this
> > case) and I can actually mount the root-FS from the client, instead of
> > getting an error as I would expect. Using the specific example for a
> > sometimes-mounted /cdrom in exports(5) can demonstrate this behavior.
Just fyi, I also plan on coming up with a patch for exports(5) to make the
correct semantics of -alldirs clearer. It only explains that -alldirs
is only supposed
to work on mount points in the examples section. It took me a couple of
passes through it before I spotted it and realized this is a bug.

rick

> >
> >   /etc/rc.conf :
> > nfs_server_enable="YES"
> > rpcbind_enable="YES"
> > rpc_statd_enable="YES"
> > rpc_lockd_enable="YES"
> > mountd_enable="YES"
> >
> >   /etc/exports :
> > /cdrom -alldirs,quiet,ro -network=10.0.0.0/24
> >
> > (at this time /cdrom exists as a directory but is not currently a
> > filesystem mount point)
> > on the server:
> > root@zfstest1:~ # killall -HUP mountd
> >
> >   /var/log/messages:
> > Nov 20 22:34:56 zfstest1 mountd[27724]: Warning: exporting /cdrom
> > exports entire / file system
> I took a closer look and this is a bug. It appears that -alldirs is supposed
> to fail when a non-mountpoint is exported.
>
> It appears to have been introduced to the system long ago, although I
> haven't yet tracked down the commit.
>
> mountd.c assumes that nmount(8) will fail when the directory path
> is not a mount point, however for MNT_UPDATE (which is what is
> used to export file systems) this is not the case.
>
> Please create a bugzilla bug report for this and I will work on a patch.
>
> Btw, quiet is also broken in the sense that it will cause any nmount(8)
> failure to fail. However, since nmount(8) does not fail for this case,
> it hardly matters. I will come up with a patch for this too, since it is
> easy to fix.
>
> Thanks for reporting this, rick
>
> >
> > root@zfstest1:~ # showmount -e
> > Exports list on localhost:
> > /cdrom 10.0.0.0
> >
> >
> > on a client, I can now mount "/" from my server zfstest1:
> >
> > root@client1:~ # mount -r -t nfs zfstest1:/ /mnt
> > root@client1:~ # mount | tail -n1
> > zfstest1:/ on /mnt (nfs, read-only)
> >
> > The root-FS of zfstest1 is indeed visible in /mnt on client1
> >
> > From what I see in /usr/src/usr.sbin/mountd/mountd.c this isn't
> > supposed to happen (I'm no C programmer but this did read something
> > like I should receive an export error from mountd when I send a HUP):
> > ...
> > } else if (!strcmp(cpopt, "alldirs")) {
> > opt_flags |= OP_ALLDIRS;
> > ...
> > if (opt_flags & OP_ALLDIRS) {
> > if (errno == EINVAL)
> > syslog(LOG_ERR,
> > "-alldirs requested but %s is not a filesystem mountpoint",
> > dirp);
> > else
> > syslog(LOG_ERR,
> > "could not remount %s: 
> > %m",
> > dirp);
> > ret = 1;
> > goto error_exit;
> > }
> >
> > I suspect this code path isn't being hit since I'm getting the mountd
> > warning I referenced above instead of this error. This appear

Re: 14.1 NFS / mountd : -alldirs not working as expected

2024-11-25 Thread Rick Macklem
On Wed, Nov 20, 2024 at 8:01 PM Michael Proto  wrote:
>
> Hello all,
>
> Running into an issue with a 14.1 server that I think is a bug, though
> it may be me not interpreting documentation correctly so I wanted to
> ask here.
>
> Using NFSv3, with FreeBSD 14.1 as the NFS server. Based on what I see
> in exports(5), if I want to export conditional mounts (IE filesystem
> paths that are intermittently mounted locally on server) I should use
> -alldirs and specify the mount-point as the export. Per the manpage,
> this export should only be accessible when the exported directory is
> actually the root of a mounted filesystem. Currently if mountd is
> HUPed while the export isn't a filesystem mount I get the warning
> about exporting the filesystem "below" the export (root-FS in this
> case) and I can actually mount the root-FS from the client, instead of
> getting an error as I would expect. Using the specific example for a
> sometimes-mounted /cdrom in exports(5) can demonstrate this behavior.
>
>   /etc/rc.conf :
> nfs_server_enable="YES"
> rpcbind_enable="YES"
> rpc_statd_enable="YES"
> rpc_lockd_enable="YES"
> mountd_enable="YES"
>
>   /etc/exports :
> /cdrom -alldirs,quiet,ro -network=10.0.0.0/24
>
> (at this time /cdrom exists as a directory but is not currently a
> filesystem mount point)
> on the server:
> root@zfstest1:~ # killall -HUP mountd
>
>   /var/log/messages:
> Nov 20 22:34:56 zfstest1 mountd[27724]: Warning: exporting /cdrom
> exports entire / file system
I took a closer look and this is a bug. It appears that -alldirs is supposed
to fail when a non-mountpoint is exported.

It appears to have been introduced to the system long ago, although I
haven't yet tracked down the commit.

mountd.c assumes that nmount(8) will fail when the directory path
is not a mount point, however for MNT_UPDATE (which is what is
used to export file systems) this is not the case.

Please create a bugzilla bug report for this and I will work on a patch.

Btw, quiet is also broken in the sense that it will cause any nmount(8)
failure to fail. However, since nmount(8) does not fail for this case,
it hardly matters. I will come up with a patch for this too, since it is
easy to fix.

Thanks for reporting this, rick

>
> root@zfstest1:~ # showmount -e
> Exports list on localhost:
> /cdrom 10.0.0.0
>
>
> on a client, I can now mount "/" from my server zfstest1:
>
> root@client1:~ # mount -r -t nfs zfstest1:/ /mnt
> root@client1:~ # mount | tail -n1
> zfstest1:/ on /mnt (nfs, read-only)
>
> The root-FS of zfstest1 is indeed visible in /mnt on client1
>
> From what I see in /usr/src/usr.sbin/mountd/mountd.c this isn't
> supposed to happen (I'm no C programmer but this did read something
> like I should receive an export error from mountd when I send a HUP):
> ...
> } else if (!strcmp(cpopt, "alldirs")) {
> opt_flags |= OP_ALLDIRS;
> ...
> if (opt_flags & OP_ALLDIRS) {
> if (errno == EINVAL)
> syslog(LOG_ERR,
> "-alldirs requested but %s is not a filesystem mountpoint",
> dirp);
> else
> syslog(LOG_ERR,
> "could not remount %s: 
> %m",
> dirp);
> ret = 1;
> goto error_exit;
> }
>
> I suspect this code path isn't being hit since I'm getting the mountd
> warning I referenced above instead of this error. This appears to be a
> possible recurrence of a very old bug that depicts similar behavior :
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=170413
> While it appears the "-sec" issue referenced in that bug is fixed in
> the listed PRs I didn't see anything on this -alldirs issue that's
> also mentioned there, maybe that's why I'm running into this now?
>
> I'd be totally unsurprised if my /etc/exports file isn't configured
> correctly, but I reduced my setup to just the example in the exports
> man page and I'm struggling to determine how to interpret that
> information differently. I also tried an export of /cdrom with only
> "-alldirs" as an option and I get the same behavior. Ideas?
>
>
> Thanks,
> Michael Proto
>



Re: NFSd not registering on 14.2.

2024-11-21 Thread Rick Macklem
On Thu, Nov 21, 2024 at 1:22 PM Zaphod Beeblebrox  wrote:
>
>
> I've tried a lot of different combinations of rc variables.  On 13.3 and 
> 14.1, nfsd in most (non-v4-only) configurations registers to rpcbind as 
> expected.  This is true of restarting nfsd and using nfsd -r.
>
> However on 14.2, I can't contrive any configuration that registers to 
> rpcbind.  Minimally, on one fairly quiet 14.1 server, I simply have
>
> nfs_server_enable="YES"
> mountd_enable="YES"
> mountd_flags="-h  -S"
>
> on another, I have more:
>
> mountd_enable="YES"
> nfs_client_enable="YES"
> nfs_server_enable="YES"
> nfsv4_server_enable="NO"
> #nfs_server_flags="-u -t -n 12" # Flags to nfsd (if enabled).
> nfsuserd_enable="YES"
> nfsuserd_flags="-domain daveg.ca"
> nfscbd_enable="YES"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
>
> readup for what the 14.2 server has --- but I've tried configurations going 
> from the former to the latter.  None of them register.
>
All I can suggest is checking lo0 to make sure it is using 127.0.0.1.
See what
# ifconfig -a
shows.

If lo0 is not 127.0.0.1, that would explain it, since the rpcbind stuff uses
127.0.0.1.

Note that 127.0.0.1 gets added automatically when "-h" is used.

Btw, I don't think I changed anything w.r.t. this between 14.1 and 14.2,
so it is likely some other network related change.

rick



Re: 14.1 NFS / mountd : -alldirs not working as expected

2024-11-21 Thread Rick Macklem
On Thu, Nov 21, 2024 at 1:56 PM Michael Proto  wrote:
>
> On Thu, Nov 21, 2024 at 7:11 AM Rick Macklem  wrote:
> >
> > On Wed, Nov 20, 2024 at 8:01 PM Michael Proto  wrote:
> > >
> > > Hello all,
> > >
> > > Running into an issue with a 14.1 server that I think is a bug, though
> > > it may be me not interpreting documentation correctly so I wanted to
> > > ask here.
> > =alldirs simply means that any directory within the server file system
> > can be mounted. So, yes, everything up to the root dir can be mounted.
> >
> > Normally, the directory for such an exports line would be the root directory
> > of the file system, but I doubt mountd actually enforces that, since the 
> > export
> > line is for "all directories" in the file system.
> >
> > >
> > > Using NFSv3, with FreeBSD 14.1 as the NFS server. Based on what I see
> > > in exports(5), if I want to export conditional mounts (IE filesystem
> > > paths that are intermittently mounted locally on server)
> > No idea what you mean by "intermittently mounted locally"?
> > (An export will be for whatever file system is mounted for the directory
> > at the time mountd is started or updates exports when a SIGHUP is
> > sent to it.)
> >
> > Exporting a file system that is not always mounted on the server is
> > a very bad idea imho. It would be much better to add the exports(5)
> > line after the file system is mounted and remove it before the file
> > system is unmounted, if you need to export a file system not always
> > mounted.
> >
>
> Agreed, for the rare circumstances where I use this the playbook has
> always been to update /etc/exports before and after any (un)mounting,
> just interested if mountd would programmatically enforce it for the
> hopefully-rare time such steps are overlooked. Seeing that error in
> the mountd.c code gave me hope mountd itself could assist there,
> regardless I have other ways of achieving the same result.
I suppose a new exports option that says "only do the export if the
directory path is the root of a file system" might be useful.

I'll stick it on my todo list, rick

>
> Appreciate the response.
>
>
> -Michael Proto



Re: cpdup fails silently on automounted NFS src dir

2025-04-06 Thread Rick Macklem
On Sun, Apr 6, 2025 at 11:53 AM G. Paul Ziemba
 wrote:
>
> Summary: interaction between autounmountd and cpdup's mount-point-traversal
> detection truncates tree copies early without error.
>
> I'm running 14-stable and am seeing this both on:
>
> - 14.0-STABLE built from sources of 27 Mar 2024 and also on
> - 14.2-STABLE built from sources of 3 Apr 2025.
>
> There doesn't seem to be anything specific to 14-stable so I'll bet
> this issue also manifests on earlier versions of FreeBSD.
>
> I think I understand what's happening (details below), but I'm
> not sure about the right way to fix it.
>
> Scenario
>
> A large file tree (in my case, the FreeBSD source tree) is published
> on an NFS server.
>
> A FreeBSD NFS client automounts a volume containing this
> large file tree.
>
> cpdup attempts to copy the file tree to another location (in my
> case, that happens to be another NFS filesystem, but I don't think
> it matters).
>
> cpdup completes without error, however, the destination directory
> is incomplete, with many empty directories.
>
> Analysis
>
> cpdup examines the device ID (st_dev) returned by stat(2) as it
> traverses the source and destination trees copying directories
> and files. When it finds an st_dev value different from the initial
> value at the top of the respective tree, it concludes that it has
> crossed a mount point and prunes the copy at that point.
>
> I instrumented cpdup with some additional logging to examine its
> notion of the src and dst st_dev values and found that, in my
> test case, in the middle of its tree copy, cpdup started getting
> unexpected new values of st_dev for the src tree and skipping
> all directories after that.
>
> --- src/cpdup.c.orig2025-04-04 15:04:44.623646000 -0700
> +++ src/cpdup.c 2025-04-05 15:10:52.779426000 -0700
> @@ -947,10 +947,15 @@
>  * When copying a directory, stop if the source crosses a mount
>  * point.
>  */
> -   if (sdevNo != (dev_t)-1 && stat1->st_dev != sdevNo)
> +   if (VerboseOpt >= 2)
> +   logstd("sdevNo: %ld, stat1->st_dev: %ld\n", sdevNo, 
> stat1->st_dev);
> +   if (sdevNo != (dev_t)-1 && stat1->st_dev != sdevNo) {
> +   if (VerboseOpt >= 2)
> +   logstd("setting skipdir due to sdevNo != stat1->st_dev\n");
> skipdir = 1;
> -   else
> +   } else {
> sdevNo = stat1->st_dev;
> +   }
>
> I eventually looked at the automounter and added some logging via
> devd.conf:
>
> notify 10 {
> match "system"  "VFS";
> match "subsystem"   "FS";
> action "logger VFS FS msg=$*";
> };
>
> And saw the following in /var/log/messages:
>
> Apr  6 10:39:31 f14s-240327-portbuilder me[58694]: VFS FS msg=!system=VFS 
> subsystem=FS type=MOUNT mount-point="/s/public" 
> mount-dev="hairball:/v2/Source/public" mount-type="nfs" 
> fsid=0x94ff003a3a00 owner=0 flags="automounted;"
> Apr  6 10:49:54 f14s-240327-portbuilder me[58761]: VFS FS msg=!system=VFS 
> subsystem=FS type=UNMOUNT mount-point="/s/public" 
> mount-dev="hairball:/v2/Source/public" mount-type="nfs" 
> fsid=0x94ff003a3a00 owner=0 flags="automounted;"
> Apr  6 10:49:54 f14s-240327-portbuilder me[58770]: VFS FS msg=!system=VFS 
> subsystem=FS type=MOUNT mount-point="/s/public" 
> mount-dev="hairball:/v2/Source/public" mount-type="nfs" 
> fsid=0x95ff003a3a00 owner=0 flags="automounted;"
>
> (By the way, st_dev reported by my new cpdup log messages was a
> rearranged version of "fsid" in the devd messages)
>
> Note that after ten minutes, the NFS filesystem is unmounted and then
> immediately remounted.
>
> The source code of /usr/sbin/autounmountd indicates that it
> attempts to unmount automounted filesystems ten minutes after
> they have been mounted (modulo some sleep-related jitter).
>
> The immediately following mount (presumably triggered by the
> next filesystem access by cpdup) results in a new value of fsid,
> thus changing what cpdup sees as st_dev, causing it to treat
> all following directory descents as mount-point crossings.
>
> Possible Mitigations
>
> 1. It might be possible to prevent unmounting by causing cpdup
>to chdir to the top of the source directory. However, it seems
>to perform similar st_dev checks on the destination directory
>and therefore a similar issue would arise with the dst tree.
>
> 2. Reusing the old fsid in the new mount? I'm guessing there
>were good reasons for assigning a new fsid, so it's probably
>a bad idea.
>
> 3. cpdup could call stat() on the top of the tree each time
>it made a comparison. There might still be a race and the
>comparison might fail if the automatic unmount occurred
>between the two stat() calls.
>
>Although THAT could be worked around by retrying the