WITH_META_MODE: base clang excessive compiles
Hello, Just to check that I'm using the correct setting for WITH_META_MODE since almost everytime I update main tree, I got clang compiled. Is that normal? -- $ kldstat | grep filemon 111 0x849f2000 3250 filemon.ko /etc/src-env.conf: WITH_META_MODE=yes --- (maybe usefull to show this as well) /etc/src.conf: WITH_MALLOC_PRODUCTION=yes WITHOUT_LLVM_ASSERTIONS=yes /etc/make.conf: KERNCONF=GENERIC-NODEBUG DEVELOPER=yes DEVELOPER_MODE=yes PORTSDIR=/home/nunotex/Work/freebsd/ports/main DISTDIR=/Arq/DISTFILES Thanks, -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
On Mon, 27 Jan 2025 19:28:28 +0100 Yamagi wrote: > Hi, > > sounds like the Alder Lakes PCID bug in N100 flavor. On the small > cores the INVLPG instruction is broken, failing to flush all > (global?) TLP entries leading to cache corruption. FreeBSD has a work > around for > that: > https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a > > However that work around never fully solved the problem on the N100 > series. My own N100 board was never stable with PCID enabled and > there are several other reports of the same problems. For example > https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html > > Since Linux went with disabling PCID all together on all Alder Lake > and Raptor LAKE CPUs, I did the same by setting > vm.pmap.pcid_enabled=0 in loader.conf. Since I did that the system is > running fine. > > The Linux commit disabling PCID is here: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546 > > A microcode update might also help. I didn't test the updates > released by Intel since early last year so I don't know for sure. > It looks like the right thing, in my case adding vm.pmap.pcid_enabled=0 to /boot/loader.conf helps. I consider this easier than installing port for microcode update... That being said, could someone add some more pro/cons for those two approaches? Additionally, I am using M.2 SATA drive at the moment. While NVMe drive worked to some extent, if fsck was necessary for some reason, it was unpleasant - some 'waiting for nvme reset' event occured, this led to nvme drive detach, and the only way to fix it was unscrew the drive, put it in USB-NVMe converter, do fsck via USB drive, then mount it back into box... not acceptable. Regards, Milan
Re: WITH_META_MODE: base clang excessive compiles
Hello! So nice, I can see now: Skipping meta for ...: ... some_file.meta: 23: file 'other' is newer than the target... This is great! Thank you so much, Simon J. Gerraty escreveu (terça, 28/01/2025 à(s) 22:51): > Nuno Teixeira wrote: > > Just to check that I'm using the correct setting for WITH_META_MODE > > since almost everytime I update main tree, I got clang compiled. > > > > Is that normal? > > Quite possibly. You can add -dM to your make command line and meta mode > will explain why it thinks a target is out of date. > If you see a target built without comment from meta mode, then the > normal oodate rules said it was out-of-date. > > > > > -- > > $ kldstat | grep filemon > > 111 0x849f2000 3250 filemon.ko > > > > /etc/src-env.conf: > > WITH_META_MODE=yes > > --- > > > > (maybe usefull to show this as well) > > /etc/src.conf: > > WITH_MALLOC_PRODUCTION=yes > > WITHOUT_LLVM_ASSERTIONS=yes > > > > /etc/make.conf: > > KERNCONF=GENERIC-NODEBUG > > DEVELOPER=yes > > DEVELOPER_MODE=yes > > PORTSDIR=/home/nunotex/Work/freebsd/ports/main > > DISTDIR=/Arq/DISTFILES > > > > Thanks, > -- Nuno Teixeira FreeBSD UNIX: Web: https://FreeBSD.org
Re: WITH_META_MODE: base clang excessive compiles
Nuno Teixeira wrote: > Just to check that I'm using the correct setting for WITH_META_MODE > since almost everytime I update main tree, I got clang compiled. > > Is that normal? Quite possibly. You can add -dM to your make command line and meta mode will explain why it thinks a target is out of date. If you see a target built without comment from meta mode, then the normal oodate rules said it was out-of-date. > > -- > $ kldstat | grep filemon > 111 0x849f2000 3250 filemon.ko > > /etc/src-env.conf: > WITH_META_MODE=yes > --- > > (maybe usefull to show this as well) > /etc/src.conf: > WITH_MALLOC_PRODUCTION=yes > WITHOUT_LLVM_ASSERTIONS=yes > > /etc/make.conf: > KERNCONF=GENERIC-NODEBUG > DEVELOPER=yes > DEVELOPER_MODE=yes > PORTSDIR=/home/nunotex/Work/freebsd/ports/main > DISTDIR=/Arq/DISTFILES > > Thanks,
Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"
On Mon, 27 Jan 2025, Mark Millard wrote: On Jan 26, 2025, at 20:51, Adrian Chadd wrote: Hi! Hello. So, there's no longer a build target for the firmware uuencoded files -> kernel module. Yea. But there are the sys/conf/files dependency lines in main that still list .fw.uu files. That includes a reference related to the error I get in my context unless I avoid "device iwmfw" in the kernel configuration: /. . ./sys/conf/files: dependency "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" \ It makes things look like the .fw.uu removal activity is still incomplete. Yes it is, This commit missed them. Manu, will you remove them? commit af0a81b6470aba4af4a24ae9804053722846ded4 Author: Emmanuel Vadot AuthorDate: Thu Dec 12 17:13:58 2024 +0100 Commit: Emmanuel Vadot CommitDate: Mon Dec 16 10:44:47 2024 +0100 iwm: Stop shipping firmware as kernel module Since we can load raw firmware start shipping them as is. This also remove the uuencode format that don't add any value and garbage collect old firmwares version. For pkgbase users they are now in the FreeBSD-firmware-iwm package. Being able to build iwm in the kernel rather than a module is broken. Now, the real issue(s) are that iwm needs firmware to initialise, and the firmware needs to exist, and thus it needs access to the rootfs for firmware_get() to find the now binary files in /boot/firmware instead of the kernel module old way, and that whole pipeline is broken if it's loaded at boot time or included in the kernel directly. There isn't a nice way to defer the firmware load attempt until /after/ rootfs is up. The answer is not to load it from loader anymore really but let devd do it's job. That is monolithic kernels + firmware are still the problem but I am not generally thinking this is the problem here. But yes, there would also be ways for firmware laod to be defered but that's also a different story. Yep. Firmware can still be loaded from loader. The following commit should have made that more easily possible without changes. commit a0f06dfb0d188966bee7265ec7d9f20093186bb6 Author: Emmanuel Vadot AuthorDate: Mon Jan 6 08:34:02 2025 +0100 Commit: Emmanuel Vadot CommitDate: Mon Jan 6 08:34:02 2025 +0100 loader: Add a list of firmware name mapping Since we started to ship raw firmware for iwm(4), users who loads the driver from loader are having problems as loader don't know that the firmwares are now raw files and not kernel modules anymore. Start a list of default entry for iwm(4) firmwares name mapping so it will still works when loaded from loader. Differential Revision: https://reviews.freebsd.org/D48211 Reviewed by:bz, imp, kevans /bz -- Bjoern A. Zeeb r15:7
Re: Difference in "netstat -rn" output in the last 2 months
while why is it 0.0.0.0 and not 0.0.0.0/0, 0/0 or 0? why is it ::/0 not ::, 0:0:0:0:0:0:0:0, 0:0:0:0:0:0:0:0/0, ::::::: or :::::::/0? realistically we need automated interfaces too. parsing everything out of human ui is very error prone! yet easy, hence why the changes are bad btw, half of tools (still?) crap out and do wharever silently when i specify things like 10/8. hint: it doesn't expand into 10.0.0.0/8. that's non-pola'ish too
Re: Difference in "netstat -rn" output in the last 2 months
On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote: A> something has changed in the output of "netstat -rn" between A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some A> other interface, so it may not be directly related to netstat itself but A> could be related to some other stuff (netlink maybe?). A> A> Does this ring a bell for someone? This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259. Allan, may be make exclusion for the "default" to keep POLA? Otherwise, indeed at time of 15.0-RELEASE we will receive some negative feedback :) Maybe double -nn should introduce new behavior? -- Gleb Smirnoff
Re: Difference in "netstat -rn" output in the last 2 months
Am 2025-01-28 18:32, schrieb Maxim Sobolev: I also think this should be reverted back to default. "-n" refers to IP to name functionality, "default" is clearly a special case. If someone wants it, some other option can be added to emit 0.0.0.0/0 [1] (not sure why but ok). This was discussed in the review referenced in the commit. The -nn proposal was there too. Personally I agree with the rationales in favor of "-nn". Bye, Alexander. -Max On Tue, Jan 28, 2025, 5:46 PM Gleb Smirnoff wrote: On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote: A> something has changed in the output of "netstat -rn" between A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some A> other interface, so it may not be directly related to netstat itself but A> could be related to some other stuff (netlink maybe?). A> A> Does this ring a bell for someone? This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259. Allan, may be make exclusion for the "default" to keep POLA? Otherwise, indeed at time of 15.0-RELEASE we will receive some negative feedback :) Maybe double -nn should introduce new behavior? -- Gleb Smirnoff -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF Links: -- [1] http://0.0.0.0/0 signature.asc Description: OpenPGP digital signature
Re: Difference in "netstat -rn" output in the last 2 months
On Tue, Jan 28, 2025 at 08:46:04AM -0800, Gleb Smirnoff wrote: > On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote: > A> something has changed in the output of "netstat -rn" between > A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as > A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools > A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some > A> other interface, so it may not be directly related to netstat itself but > A> could be related to some other stuff (netlink maybe?). > A> > A> Does this ring a bell for someone? > > This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259. > > Allan, may be make exclusion for the "default" to keep POLA? Otherwise, > indeed at time of 15.0-RELEASE we will receive some negative feedback :) I fully second this request. I was very much annoyed by this change, as "netstat -rn" is in my finger memory and seeing linuxish 0.0.0.0 instead of our usual "default" was totally unexpected and rather confusing. ./danfe
Re: Difference in "netstat -rn" output in the last 2 months
Discussed between 3 people, creating the problem for 3,000. Great! -Max On Tue, Jan 28, 2025, 7:05 PM Alexander Leidinger wrote: > Am 2025-01-28 18:32, schrieb Maxim Sobolev: > > I also think this should be reverted back to default. "-n" refers to IP to > name functionality, "default" is clearly a special case. If someone wants > it, some other option can be added to emit 0.0.0.0/0 (not sure why but > ok). > > > This was discussed in the review referenced in the commit. The -nn > proposal was there too. Personally I agree with the rationales in favor of > "-nn". > > Bye, > Alexander. > > > -Max > > On Tue, Jan 28, 2025, 5:46 PM Gleb Smirnoff wrote: > > On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote: > A> something has changed in the output of "netstat -rn" between > A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not > listed as > A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some > tools > A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or > some > A> other interface, so it may not be directly related to netstat itself but > A> could be related to some other stuff (netlink maybe?). > A> > A> Does this ring a bell for someone? > > This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259. > > Allan, may be make exclusion for the "default" to keep POLA? Otherwise, > indeed at time of 15.0-RELEASE we will receive some negative feedback :) > > Maybe double -nn should introduce new behavior? > > -- > Gleb Smirnoff > > > -- > http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF > http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF >
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
On 2025-01-28 06:23, Milan Obuch wrote: On Mon, 27 Jan 2025 19:28:28 +0100 Yamagi wrote: Hi, sounds like the Alder Lakes PCID bug in N100 flavor. On the small cores the INVLPG instruction is broken, failing to flush all (global?) TLP entries leading to cache corruption. FreeBSD has a work around for that: https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a However that work around never fully solved the problem on the N100 series. My own N100 board was never stable with PCID enabled and there are several other reports of the same problems. For example https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html Since Linux went with disabling PCID all together on all Alder Lake and Raptor LAKE CPUs, I did the same by setting vm.pmap.pcid_enabled=0 in loader.conf. Since I did that the system is running fine. The Linux commit disabling PCID is here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546 A microcode update might also help. I didn't test the updates released by Intel since early last year so I don't know for sure. It looks like the right thing, in my case adding vm.pmap.pcid_enabled=0 to /boot/loader.conf helps. I consider this easier than installing port for microcode update... That being said, could someone add some more pro/cons for those two approaches? Additionally, I am using M.2 SATA drive at the moment. While NVMe drive worked to some extent, if fsck was necessary for some reason, it was unpleasant - some 'waiting for nvme reset' event occured, this led to nvme drive detach, and the only way to fix it was unscrew the drive, put it in USB-NVMe converter, do fsck via USB drive, then mount it back into box... not acceptable. I chose microcode but that was hard to do because I only have one nvme slot and the installer panicked trying to install the package at the final part of the install. I had to install onto an SD and then use another FreeBSD install to do a pkg chroot install onto that temporary media and then use that to boot with the firmware update and chroot install the firmware and edit loader.conf on the nvme. The microcode update fixed it for me. I inferred from reading that enable PCID might have a performance advantage. Ian
Re: January 2025 stabilization week
On Mon, Jan 27, 2025 at 01:01:16AM -0800, Gleb Smirnoff wrote: T> This is an automated email to inform you that the January 2025 stabilization week T> started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as T> main-stabweek-2025-Jan. Two regressions were identified: * Compilation failure on 32-bit platforms. Fix 5289625dfecb. * Instant panic with SO_REUSEPORT_LB and nginx. Fix 06bf119f265c. My personal experience with updating desktops & laptops went smooth, and several other people also reported their success. Same stands true for my home router. Testing at Neflix did not discover any stability regressions except mentioned above SO_REUSEPORT_LB. We observe significantly increased CPU load on a few specific hardware samples with INVARIANTS kernel. There is no data yet without INVARIANTS. We will investigate this deeper later, this doesn't warrant prolonging the code freeze. The advisory code freeze on main branch is thawed! Thanks everyone! -- Gleb Smirnoff
Re: HEADS UP: NFS changes coming into CURRENT early February
On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote: T> Second, with the patch the M_RPC leak count for me is 2. And I found that these T> two items are basically is a clnt_vc that belongs to a closed connection: T> T> f80029614a80 tcp4 0 0 10.6.6.9.772 10.6.6.9.2049 CLOSED T> T> There is no connection peer connection, as the server received a timeout trying T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just T> keeps it select(2) fd set and doesn't garbage collect. T> T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think T> this is related to my changes. Here is what is going on here: - TCP connection is teared down and tcp_close() calls soisdisconnected() - soisdisconnected() calls clnt_vc_soupcall() to notify of error condition - clnt_vc_soupcall() tries soreceive() and gets so->so_error. - clnt_vc_soupcall() sets the client to error state. It doesn't wakeup anything cause there were no running RPC requests. It can't report back to clnt_rc that connection is dead. It doesn't mark itself for the clnt_vc_dotlsupcall() processing. So we end up with: (kgdb) p $tp->t_state $25 = 0 /* TCPS_CLOSED */ (kgdb) p/x $tp->t_inpcb.inp_flags & 0x0400 /* INP_DROPPED */ $27 = 0x400 (kgdb) p/x $tp->t_inpcb.inp_socket->so_state $28 = 0x2000/* SS_ISDISCONNECTED */ (kgdb) p/x $tp->t_inpcb.inp_socket->so_count $35 = 0x2 (kgdb) p/x $ct->ct_rcvstate $29 = 0x41 /* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */ (kgdb) p $ct->ct_error $30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 0}}} (kgdb) p $ct->ct_pending $31 = {tqh_first = 0x0, tqh_last = 0xf80002838ea8} Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear down connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote has reset. That's why $ct->ct_error.ru.RE_errno == 13. So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc that we are dead and ready to be garbage collected via CLNT_CLOSE() and then CLNT_RELEASE(). Once clnt_vc_destroy() is called the daemon will be notified that the TLS socket can be closed by the daemon, bringing so_count to 1 and then final sorele() will bring it to 0 and free. -- Gleb Smirnoff
Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"
Hello, On Wed, 29 Jan 2025 03:33:17 + (UTC) "Bjoern A. Zeeb" wrote: > On Mon, 27 Jan 2025, Mark Millard wrote: > > > On Jan 26, 2025, at 20:51, Adrian Chadd wrote: > > > > > >> Hi! > > > > Hello. > > > >> So, there's no longer a build target for the firmware uuencoded files -> > >> kernel module. > > > > Yea. But there are the sys/conf/files dependency lines in > > main that still list .fw.uu files. That includes a reference > > related to the error I get in my context unless I avoid > > "device iwmfw" in the kernel configuration: > > > > /. . ./sys/conf/files: dependency > > "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" \ > > > > It makes things look like the .fw.uu removal activity is still > > incomplete. > > Yes it is, This commit missed them. Manu, will you remove them? There is actually two problems with my commits, the first one is that you cannot compile a kernel with iwm firmware in it, rea@ sent me some patches a few weeks ago that solves that so this will be fixed soon-ish. The second is people loading from loader who updates their box from 13/14 to current, as make installkernel doesn't install the firmware anymore it will fail, removing iwm_load from loader.conf for the upgrade phase is the way to go I think as I don't see any other way. > commit af0a81b6470aba4af4a24ae9804053722846ded4 > Author: Emmanuel Vadot > AuthorDate: Thu Dec 12 17:13:58 2024 +0100 > Commit: Emmanuel Vadot > CommitDate: Mon Dec 16 10:44:47 2024 +0100 > > iwm: Stop shipping firmware as kernel module > > Since we can load raw firmware start shipping them as is. > This also remove the uuencode format that don't add any value and garbage > collect old firmwares version. > For pkgbase users they are now in the FreeBSD-firmware-iwm package. > > > > >> Being able to build iwm in the kernel rather than a module is broken. > >> > >> Now, the real issue(s) are that iwm needs firmware to initialise, and the > >> firmware needs to exist, and thus it needs access to the rootfs for > >> firmware_get() to find the now binary files in /boot/firmware instead of > >> the kernel module old way, and that whole pipeline is broken if it's > >> loaded at boot time or included in the kernel directly. There isn't a nice > >> way to defer the firmware load attempt until /after/ rootfs is up. > > The answer is not to load it from loader anymore really but let devd do > it's job. Yes agreed and I've done that for quite some time but it seems that we advertize for a long time that loading wifi module from loader wasthe way to go so a lot of people were bitten by this. > That is monolithic kernels + firmware are still the problem but I am not > generally thinking this is the problem here. > > But yes, there would also be ways for firmware laod to be defered but > that's also a different story. Agreed. > > Yep. > > Firmware can still be loaded from loader. The following commit should have > made that more easily possible without changes. > > commit a0f06dfb0d188966bee7265ec7d9f20093186bb6 > Author: Emmanuel Vadot > AuthorDate: Mon Jan 6 08:34:02 2025 +0100 > Commit: Emmanuel Vadot > CommitDate: Mon Jan 6 08:34:02 2025 +0100 > > loader: Add a list of firmware name mapping > > Since we started to ship raw firmware for iwm(4), users who loads > the driver from loader are having problems as loader don't know that > the firmwares are now raw files and not kernel modules anymore. > Start a list of default entry for iwm(4) firmwares name mapping so it > will > still works when loaded from loader. > > Differential Revision: https://reviews.freebsd.org/D48211 > Reviewed by:bz, imp, kevans > > > /bz > > -- > Bjoern A. Zeeb r15:7 -- Emmanuel Vadot
Re: HEADS UP: NFS changes coming into CURRENT early February
On Mon, Jan 27, 2025 at 06:10:42PM -0800, Rick Macklem wrote: R> I think I've found a memory leak, but it shouldn't be a show stopper. R> R> What I did on the NFS client side is: R> # vmstat -m | fgrep -i rpc R> # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt R> # ls --lR /mnt R> --> Then I network partitioned it from the server a few times, until R> the TCP connection closed. R> (My client is in bhyve and the server on the system the bhyve R>instance is running in. I just "ifconfig bridge0 down", waited for R>the TCP connection to close "netstat --a" then "ifconfig bridge0 up". R> Once done, I R> # umount /mnt R> # vmstat -m | fgrep -i rpc R> and say a somewhat larger allocation count R> R> The allocation count only goes up if I do the network partitioning R> and only on the NFS client side. R> R> Since the leak is slow and only happens when the TCP connection R> breaks, I do not think it is a show stopper and one of us can track it down R> someday. I reproduced the recipe and find two problems, but don't have a final solution for either. First, when we create backchannel in sys/fs/nfs/nfs_commonkrpc.c newnfs_connect(): if (nfs_numnfscbd > 0) { nfs_numnfscbd++; NFSD_UNLOCK(); xprt = svc_vc_create_backchannel( nfscbd_pool); CLNT_CONTROL(client, CLSET_BACKCHANNEL, xprt); NFSD_LOCK(); nfs_numnfscbd--; if (nfs_numnfscbd == 0) wakeup(&nfs_numnfscbd); } The svc_vc_create_backchannel() creates xprt with refcount=1. Then we link it to our client, CLNT_CONTROL() makes refcount=2. Then this functions forgets the pointer (but it owns refcount). Whenever the client is destroyed, it will do SVC_RELEASE() on the backchannel, refcount goes 2 to 1 and it is leaked. I made a patch against that and it does reduce amount of M_RPC leaked after your recipe, but once I got panic where the backchannel xprt is actually used after free. So looks like there is a case that my patch doesn't cover. What drives me crazy, can't reproduce it for the second time and can't get any clue from the single core. Patch attached. Second, with the patch the M_RPC leak count for me is 2. And I found that these two items are basically is a clnt_vc that belongs to a closed connection: f80029614a80 tcp4 0 0 10.6.6.9.772 10.6.6.9.2049 CLOSED There is no connection peer connection, as the server received a timeout trying to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just keeps it select(2) fd set and doesn't garbage collect. So it is a bigger resource leak than just two pieces of M_RPC. I don't think this is related to my changes. -- Gleb Smirnoff commit 1ca54035f1ab46884675fdbba7364624e1217c9e Author: Gleb Smirnoff Date: Tue Jan 28 20:13:19 2025 -0800 try diff --git a/sys/fs/nfs/nfs_commonkrpc.c b/sys/fs/nfs/nfs_commonkrpc.c index e5c658ce76d2..9f3583a09037 100644 --- a/sys/fs/nfs/nfs_commonkrpc.c +++ b/sys/fs/nfs/nfs_commonkrpc.c @@ -262,7 +262,6 @@ newnfs_connect(struct nfsmount *nmp, struct nfssockreq *nrp, struct socket *so; int one = 1, retries, error = 0; struct thread *td = curthread; - SVCXPRT *xprt; struct timeval timo; uint64_t tval; @@ -430,12 +429,15 @@ newnfs_connect(struct nfsmount *nmp, struct nfssockreq *nrp, */ NFSD_LOCK(); if (nfs_numnfscbd > 0) { + SVCXPRT *xprt; + nfs_numnfscbd++; NFSD_UNLOCK(); xprt = svc_vc_create_backchannel( nfscbd_pool); CLNT_CONTROL(client, CLSET_BACKCHANNEL, xprt); + SVC_RELEASE(xprt); NFSD_LOCK(); nfs_numnfscbd--; if (nfs_numnfscbd == 0)