WITH_META_MODE: base clang excessive compiles

2025-01-28 Thread Nuno Teixeira
Hello,

Just to check that I'm using the correct setting for WITH_META_MODE since
almost everytime I update main tree, I got clang compiled.

Is that normal?

--
$ kldstat | grep filemon
111 0x849f2000 3250 filemon.ko

/etc/src-env.conf:
WITH_META_MODE=yes
---

(maybe usefull to show this as well)
/etc/src.conf:
WITH_MALLOC_PRODUCTION=yes
WITHOUT_LLVM_ASSERTIONS=yes

/etc/make.conf:
KERNCONF=GENERIC-NODEBUG
DEVELOPER=yes
DEVELOPER_MODE=yes
PORTSDIR=/home/nunotex/Work/freebsd/ports/main
DISTDIR=/Arq/DISTFILES

Thanks,
-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org


Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-28 Thread Milan Obuch
On Mon, 27 Jan 2025 19:28:28 +0100
Yamagi  wrote:

> Hi,
> 
> sounds like the Alder Lakes PCID bug in N100 flavor. On the small
> cores the INVLPG instruction is broken, failing to flush all
> (global?) TLP entries leading to cache corruption. FreeBSD has a work
> around for
> that: 
> https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a
> 
> However that work around never fully solved the problem on the N100 
> series. My own N100 board was never stable with PCID enabled and
> there are several other reports of the same problems. For example 
> https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html
> 
> Since Linux went with disabling PCID all together on all Alder Lake
> and Raptor LAKE CPUs, I did the same by setting
> vm.pmap.pcid_enabled=0 in loader.conf. Since I did that the system is
> running fine.
> 
> The Linux commit  disabling PCID is here: 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546
> 
> A microcode update might also help. I didn't test the updates
> released by Intel since early last year so I don't know for sure.
>

It looks like the right thing, in my case adding vm.pmap.pcid_enabled=0
to /boot/loader.conf helps. I consider this easier than installing port
for microcode update...

That being said, could someone add some more pro/cons for those two
approaches?

Additionally, I am using M.2 SATA drive at the moment. While NVMe drive
worked to some extent, if fsck was necessary for some reason, it was
unpleasant - some 'waiting for nvme reset' event occured, this led to
nvme drive detach, and the only way to fix it was unscrew the drive,
put it in USB-NVMe converter, do fsck via USB drive, then mount it back
into box... not acceptable.

Regards,
Milan



Re: WITH_META_MODE: base clang excessive compiles

2025-01-28 Thread Nuno Teixeira
Hello!

So nice, I can see now:

Skipping meta for ...: ...
some_file.meta: 23: file 'other' is newer than the target...

This is great!

Thank you so much,

Simon J. Gerraty  escreveu (terça, 28/01/2025 à(s) 22:51):

> Nuno Teixeira  wrote:
> > Just to check that I'm using the correct setting for WITH_META_MODE
> > since almost everytime I update main tree, I got clang compiled.
> >
> > Is that normal?
>
> Quite possibly.  You can add -dM to your make command line and meta mode
> will explain why it thinks a target is out of date.
> If you see a target built without comment from meta mode, then the
> normal oodate rules said it was out-of-date.
>
> >
> > --
> > $ kldstat | grep filemon
> > 111 0x849f2000 3250 filemon.ko
> >
> > /etc/src-env.conf:
> > WITH_META_MODE=yes
> > ---
> >
> > (maybe usefull to show this as well)
> > /etc/src.conf:
> > WITH_MALLOC_PRODUCTION=yes
> > WITHOUT_LLVM_ASSERTIONS=yes
> >
> > /etc/make.conf:
> > KERNCONF=GENERIC-NODEBUG
> > DEVELOPER=yes
> > DEVELOPER_MODE=yes
> > PORTSDIR=/home/nunotex/Work/freebsd/ports/main
> > DISTDIR=/Arq/DISTFILES
> >
> > Thanks,
>


-- 
Nuno Teixeira
FreeBSD UNIX: Web:  https://FreeBSD.org


Re: WITH_META_MODE: base clang excessive compiles

2025-01-28 Thread Simon J. Gerraty
Nuno Teixeira  wrote:
> Just to check that I'm using the correct setting for WITH_META_MODE
> since almost everytime I update main tree, I got clang compiled. 
> 
> Is that normal?

Quite possibly.  You can add -dM to your make command line and meta mode
will explain why it thinks a target is out of date.
If you see a target built without comment from meta mode, then the
normal oodate rules said it was out-of-date.

> 
> --
> $ kldstat | grep filemon
> 111 0x849f2000 3250 filemon.ko
> 
> /etc/src-env.conf:
> WITH_META_MODE=yes
> ---
> 
> (maybe usefull to show this as well)
> /etc/src.conf:
> WITH_MALLOC_PRODUCTION=yes
> WITHOUT_LLVM_ASSERTIONS=yes
> 
> /etc/make.conf:
> KERNCONF=GENERIC-NODEBUG
> DEVELOPER=yes
> DEVELOPER_MODE=yes
> PORTSDIR=/home/nunotex/Work/freebsd/ports/main
> DISTDIR=/Arq/DISTFILES
> 
> Thanks,



Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"

2025-01-28 Thread Bjoern A. Zeeb

On Mon, 27 Jan 2025, Mark Millard wrote:


On Jan 26, 2025, at 20:51, Adrian Chadd  wrote:



Hi!


Hello.


So, there's no longer a build target for the firmware uuencoded files -> kernel 
module.


Yea. But there are the sys/conf/files dependency lines in
main that still list .fw.uu files. That includes a reference
related to the error I get in my context unless I avoid
"device iwmfw" in the kernel configuration:

/. . ./sys/conf/files:   dependency  "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" 
\

It makes things look like the .fw.uu removal activity is still
incomplete.


Yes it is,  This commit missed them.  Manu, will you remove them?

commit af0a81b6470aba4af4a24ae9804053722846ded4
Author: Emmanuel Vadot 
AuthorDate: Thu Dec 12 17:13:58 2024 +0100
Commit: Emmanuel Vadot 
CommitDate: Mon Dec 16 10:44:47 2024 +0100

iwm: Stop shipping firmware as kernel module

Since we can load raw firmware start shipping them as is.
This also remove the uuencode format that don't add any value and garbage
collect old firmwares version.
For pkgbase users they are now in the FreeBSD-firmware-iwm package.




Being able to build iwm in the kernel rather than a module is broken.

Now, the real issue(s) are that iwm needs firmware to initialise, and the 
firmware needs to exist, and thus it needs access to the rootfs for 
firmware_get() to find the now binary files in /boot/firmware instead of the 
kernel module old way, and that whole pipeline is broken if it's loaded at boot 
time or included in the kernel directly. There isn't a nice way to defer the 
firmware load attempt until /after/ rootfs is up.


The answer is not to load it from loader anymore really but let devd do
it's job.

That is monolithic kernels + firmware are still the problem but I am not
generally thinking this is the problem here.

But yes, there would also be ways for firmware laod to be defered but
that's also a different story.


Yep.


Firmware can still be loaded from loader.  The following commit should have
made that more easily possible without changes.

commit a0f06dfb0d188966bee7265ec7d9f20093186bb6
Author: Emmanuel Vadot 
AuthorDate: Mon Jan 6 08:34:02 2025 +0100
Commit: Emmanuel Vadot 
CommitDate: Mon Jan 6 08:34:02 2025 +0100

loader: Add a list of firmware name mapping

Since we started to ship raw firmware for iwm(4), users who loads
the driver from loader are having problems as loader don't know that
the firmwares are now raw files and not kernel modules anymore.
Start a list of default entry for iwm(4) firmwares name mapping so it will
still works when loaded from loader.

Differential Revision:  https://reviews.freebsd.org/D48211
Reviewed by:bz, imp, kevans


/bz

--
Bjoern A. Zeeb r15:7



Re: Difference in "netstat -rn" output in the last 2 months

2025-01-28 Thread Sulev-Madis Silber
while why is it 0.0.0.0 and not 0.0.0.0/0, 0/0 or 0?

why is it ::/0 not ::, 0:0:0:0:0:0:0:0, 0:0:0:0:0:0:0:0/0, 
::::::: or 
:::::::/0?

realistically we need automated interfaces too. parsing everything out of human 
ui is very error prone! yet easy, hence why the changes are bad

btw, half of tools (still?) crap out and do wharever silently when i specify 
things like 10/8. hint: it doesn't expand into 10.0.0.0/8. that's non-pola'ish 
too



Re: Difference in "netstat -rn" output in the last 2 months

2025-01-28 Thread Gleb Smirnoff
On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote:
A> something has changed in the output of "netstat -rn" between
A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as
A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools
A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some
A> other interface, so it may not be directly related to netstat itself but
A> could be related to some other stuff (netlink maybe?).
A> 
A> Does this ring a bell for someone?

This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259.

Allan, may be make exclusion for the "default" to keep POLA? Otherwise,
indeed at time of 15.0-RELEASE we will receive some negative feedback :)

Maybe double -nn should introduce new behavior?

-- 
Gleb Smirnoff



Re: Difference in "netstat -rn" output in the last 2 months

2025-01-28 Thread Alexander Leidinger

Am 2025-01-28 18:32, schrieb Maxim Sobolev:

I also think this should be reverted back to default. "-n" refers to IP 
to name functionality, "default" is clearly a special case. If someone 
wants it, some other option can be added to emit 0.0.0.0/0 [1] (not 
sure why but ok).


This was discussed in the review referenced in the commit. The -nn 
proposal was there too. Personally I agree with the rationales in favor 
of "-nn".


Bye,
Alexander.


-Max

On Tue, Jan 28, 2025, 5:46 PM Gleb Smirnoff  
wrote:



On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote:
A> something has changed in the output of "netstat -rn" between
A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not 
listed as
A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks 
some tools
A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat 
or some
A> other interface, so it may not be directly related to netstat 
itself but

A> could be related to some other stuff (netlink maybe?).
A>
A> Does this ring a bell for someone?

This is very likely changed by 
9206c79961986c2114a9a2cfccf009ac010ad259.


Allan, may be make exclusion for the "default" to keep POLA? 
Otherwise,
indeed at time of 15.0-RELEASE we will receive some negative feedback 
:)


Maybe double -nn should introduce new behavior?

--
Gleb Smirnoff


--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

Links:
--
[1] http://0.0.0.0/0

signature.asc
Description: OpenPGP digital signature


Re: Difference in "netstat -rn" output in the last 2 months

2025-01-28 Thread Alexey Dokuchaev
On Tue, Jan 28, 2025 at 08:46:04AM -0800, Gleb Smirnoff wrote:
> On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote:
> A> something has changed in the output of "netstat -rn" between
> A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as
> A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools
> A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some
> A> other interface, so it may not be directly related to netstat itself but
> A> could be related to some other stuff (netlink maybe?).
> A> 
> A> Does this ring a bell for someone?
> 
> This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259.
> 
> Allan, may be make exclusion for the "default" to keep POLA? Otherwise,
> indeed at time of 15.0-RELEASE we will receive some negative feedback :)

I fully second this request.  I was very much annoyed by this change, as
"netstat -rn" is in my finger memory and seeing linuxish 0.0.0.0 instead
of our usual "default" was totally unexpected and rather confusing.

./danfe



Re: Difference in "netstat -rn" output in the last 2 months

2025-01-28 Thread Maxim Sobolev
Discussed between 3 people, creating the problem for 3,000.
Great!

-Max

On Tue, Jan 28, 2025, 7:05 PM Alexander Leidinger 
wrote:

> Am 2025-01-28 18:32, schrieb Maxim Sobolev:
>
> I also think this should be reverted back to default. "-n" refers to IP to
> name functionality, "default" is clearly a special case. If someone wants
> it, some other option can be added to emit 0.0.0.0/0 (not sure why but
> ok).
>
>
> This was discussed in the review referenced in the commit. The -nn
> proposal was there too. Personally I agree with the rationales in favor of
> "-nn".
>
> Bye,
> Alexander.
>
>
> -Max
>
> On Tue, Jan 28, 2025, 5:46 PM Gleb Smirnoff  wrote:
>
> On Sun, Jan 26, 2025 at 04:58:57PM +0100, Alexander Leidinger wrote:
> A> something has changed in the output of "netstat -rn" between
> A> 2024-11-23-195545 and 2025-01-22-151306. The default route is not
> listed as
> A> "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some
> tools
> A> (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or
> some
> A> other interface, so it may not be directly related to netstat itself but
> A> could be related to some other stuff (netlink maybe?).
> A>
> A> Does this ring a bell for someone?
>
> This is very likely changed by 9206c79961986c2114a9a2cfccf009ac010ad259.
>
> Allan, may be make exclusion for the "default" to keep POLA? Otherwise,
> indeed at time of 15.0-RELEASE we will receive some negative feedback :)
>
> Maybe double -nn should introduce new behavior?
>
> --
> Gleb Smirnoff
>
>
> --
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF
>


Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-28 Thread Ian FREISLICH




On 2025-01-28 06:23, Milan Obuch wrote:

On Mon, 27 Jan 2025 19:28:28 +0100
Yamagi  wrote:


Hi,

sounds like the Alder Lakes PCID bug in N100 flavor. On the small
cores the INVLPG instruction is broken, failing to flush all
(global?) TLP entries leading to cache corruption. FreeBSD has a work
around for
that: 
https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a

However that work around never fully solved the problem on the N100
series. My own N100 board was never stable with PCID enabled and
there are several other reports of the same problems. For example
https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html

Since Linux went with disabling PCID all together on all Alder Lake
and Raptor LAKE CPUs, I did the same by setting
vm.pmap.pcid_enabled=0 in loader.conf. Since I did that the system is
running fine.

The Linux commit  disabling PCID is here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546

A microcode update might also help. I didn't test the updates
released by Intel since early last year so I don't know for sure.



It looks like the right thing, in my case adding vm.pmap.pcid_enabled=0
to /boot/loader.conf helps. I consider this easier than installing port
for microcode update...

That being said, could someone add some more pro/cons for those two
approaches?

Additionally, I am using M.2 SATA drive at the moment. While NVMe drive
worked to some extent, if fsck was necessary for some reason, it was
unpleasant - some 'waiting for nvme reset' event occured, this led to
nvme drive detach, and the only way to fix it was unscrew the drive,
put it in USB-NVMe converter, do fsck via USB drive, then mount it back
into box... not acceptable.


I chose microcode but that was hard to do because I only have one nvme 
slot and the installer panicked trying to install the package at the 
final part of the install. I had to install onto an SD and then use 
another FreeBSD install to do a pkg chroot install onto that temporary 
media and then use that to boot with the firmware update and chroot 
install the firmware and edit loader.conf on the nvme.


The microcode update fixed it for me. I inferred from reading that 
enable PCID might have a performance advantage.


Ian



Re: January 2025 stabilization week

2025-01-28 Thread Gleb Smirnoff
On Mon, Jan 27, 2025 at 01:01:16AM -0800, Gleb Smirnoff wrote:
T> This is an automated email to inform you that the January 2025 stabilization 
week
T> started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as
T> main-stabweek-2025-Jan.

Two regressions were identified:

* Compilation failure on 32-bit platforms. Fix 5289625dfecb.
* Instant panic with SO_REUSEPORT_LB and nginx. Fix 06bf119f265c.

My personal experience with updating desktops & laptops went smooth, and
several other people also reported their success.  Same stands true for my home
router.

Testing at Neflix did not discover any stability regressions except mentioned
above SO_REUSEPORT_LB.  We observe significantly increased CPU load on a few
specific hardware samples with INVARIANTS kernel.  There is no data yet without
INVARIANTS.  We will investigate this deeper later, this doesn't warrant
prolonging the code freeze.

The advisory code freeze on main branch is thawed!  Thanks everyone!

-- 
Gleb Smirnoff



Re: HEADS UP: NFS changes coming into CURRENT early February

2025-01-28 Thread Gleb Smirnoff
On Tue, Jan 28, 2025 at 09:14:00PM -0800, Gleb Smirnoff wrote:
T> Second, with the patch the M_RPC leak count for me is 2. And I found that 
these
T> two items are basically is a clnt_vc that belongs to a closed connection:
T> 
T> f80029614a80 tcp4   0  0 10.6.6.9.772   10.6.6.9.2049  
CLOSED 
T> 
T> There is no connection peer connection, as the server received a timeout 
trying
T> to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
T> keeps it select(2) fd set and doesn't garbage collect.
T> 
T> So it is a bigger resource leak than just two pieces of M_RPC. I don't think
T> this is related to my changes.

Here is what is going on here:

- TCP connection is teared down and tcp_close() calls soisdisconnected()
- soisdisconnected() calls clnt_vc_soupcall() to notify of error condition
- clnt_vc_soupcall() tries soreceive() and gets so->so_error.
- clnt_vc_soupcall() sets the client to error state. It doesn't wakeup
  anything cause there were no running RPC requests. It can't report back
  to clnt_rc that connection is dead. It doesn't mark itself
  for the clnt_vc_dotlsupcall() processing.

So we end up with:

(kgdb) p $tp->t_state
$25 = 0 /* TCPS_CLOSED */
(kgdb) p/x $tp->t_inpcb.inp_flags & 0x0400  /* INP_DROPPED */
$27 = 0x400
(kgdb) p/x $tp->t_inpcb.inp_socket->so_state
$28 = 0x2000/* SS_ISDISCONNECTED */
(kgdb) p/x $tp->t_inpcb.inp_socket->so_count
$35 = 0x2
(kgdb) p/x $ct->ct_rcvstate 
$29 = 0x41  /* RPCRCVSTATE_UPCALLTHREAD | RPCRCVSTATE_NORMAL */
(kgdb)  p $ct->ct_error
$30 = {re_status = RPC_CANTRECV, ru = {RE_errno = 13, RE_why = 
RPCSEC_GSS_CREDPROBLEM, RE_vers = {low = 13, high = 0}, RE_lb = {s1 = 13, s2 = 
0}}}
(kgdb) p $ct->ct_pending
$31 = {tqh_first = 0x0, tqh_last = 0xf80002838ea8}

Note: In my case so->so_error was EACCESS, cause I used ipfw(4) rule to tear 
down
connection, for normal TCP timeout should be ETIMEDOUT or ECONNRESET if remote
has reset.  That's why $ct->ct_error.ru.RE_errno == 13.

So we need some mechanism for clnt_vc_soupcall() to report to upper clnt_rc
that we are dead and ready to be garbage collected via CLNT_CLOSE() and then
CLNT_RELEASE().

Once clnt_vc_destroy() is called the daemon will be notified that the TLS
socket can be closed by the daemon, bringing so_count to 1 and then final
sorele() will bring it to 0 and free.

-- 
Gleb Smirnoff



Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"

2025-01-28 Thread Emmanuel Vadot


 Hello,

On Wed, 29 Jan 2025 03:33:17 + (UTC)
"Bjoern A. Zeeb"  wrote:

> On Mon, 27 Jan 2025, Mark Millard wrote:
> 
> > On Jan 26, 2025, at 20:51, Adrian Chadd  wrote:
> >
> >
> >> Hi!
> >
> > Hello.
> >
> >> So, there's no longer a build target for the firmware uuencoded files -> 
> >> kernel module.
> >
> > Yea. But there are the sys/conf/files dependency lines in
> > main that still list .fw.uu files. That includes a reference
> > related to the error I get in my context unless I avoid
> > "device iwmfw" in the kernel configuration:
> >
> > /. . ./sys/conf/files:   dependency  
> > "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" \
> >
> > It makes things look like the .fw.uu removal activity is still
> > incomplete.
> 
> Yes it is,  This commit missed them.  Manu, will you remove them?

 There is actually two problems with my commits, the first one is that
you cannot compile a kernel with iwm firmware in it, rea@ sent me some
patches a few weeks ago that solves that so this will be fixed soon-ish.
 The second is people loading from loader who updates their box from
13/14 to current, as make installkernel doesn't install the firmware
anymore it will fail, removing iwm_load from loader.conf for the
upgrade phase is the way to go I think as I don't see any other way.

> commit af0a81b6470aba4af4a24ae9804053722846ded4
> Author: Emmanuel Vadot 
> AuthorDate: Thu Dec 12 17:13:58 2024 +0100
> Commit: Emmanuel Vadot 
> CommitDate: Mon Dec 16 10:44:47 2024 +0100
> 
>  iwm: Stop shipping firmware as kernel module
> 
>  Since we can load raw firmware start shipping them as is.
>  This also remove the uuencode format that don't add any value and garbage
>  collect old firmwares version.
>  For pkgbase users they are now in the FreeBSD-firmware-iwm package.
> 
> 
> 
> >> Being able to build iwm in the kernel rather than a module is broken.
> >>
> >> Now, the real issue(s) are that iwm needs firmware to initialise, and the 
> >> firmware needs to exist, and thus it needs access to the rootfs for 
> >> firmware_get() to find the now binary files in /boot/firmware instead of 
> >> the kernel module old way, and that whole pipeline is broken if it's 
> >> loaded at boot time or included in the kernel directly. There isn't a nice 
> >> way to defer the firmware load attempt until /after/ rootfs is up.
> 
> The answer is not to load it from loader anymore really but let devd do
> it's job.

 Yes agreed and I've done that for quite some time but it seems that we
advertize for a long time that loading wifi module from loader wasthe
way to go so a lot of people were bitten by this.

> That is monolithic kernels + firmware are still the problem but I am not
> generally thinking this is the problem here.
> 
> But yes, there would also be ways for firmware laod to be defered but
> that's also a different story.

 Agreed.

> > Yep.
> 
> Firmware can still be loaded from loader.  The following commit should have
> made that more easily possible without changes.
> 
> commit a0f06dfb0d188966bee7265ec7d9f20093186bb6
> Author: Emmanuel Vadot 
> AuthorDate: Mon Jan 6 08:34:02 2025 +0100
> Commit: Emmanuel Vadot 
> CommitDate: Mon Jan 6 08:34:02 2025 +0100
> 
>  loader: Add a list of firmware name mapping
> 
>  Since we started to ship raw firmware for iwm(4), users who loads
>  the driver from loader are having problems as loader don't know that
>  the firmwares are now raw files and not kernel modules anymore.
>  Start a list of default entry for iwm(4) firmwares name mapping so it 
> will
>  still works when loaded from loader.
> 
>  Differential Revision:  https://reviews.freebsd.org/D48211
>  Reviewed by:bz, imp, kevans
> 
> 
> /bz
> 
> -- 
> Bjoern A. Zeeb r15:7


-- 
Emmanuel Vadot  



Re: HEADS UP: NFS changes coming into CURRENT early February

2025-01-28 Thread Gleb Smirnoff
On Mon, Jan 27, 2025 at 06:10:42PM -0800, Rick Macklem wrote:
R> I think I've found a memory leak, but it shouldn't be a show stopper.
R> 
R> What I did on the NFS client side is:
R> # vmstat -m | fgrep -i rpc
R> # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt
R> # ls --lR /mnt
R> --> Then I network partitioned it from the server a few times, until
R>   the TCP connection closed.
R>   (My client is in bhyve and the server on the system the bhyve
R>instance is running in. I just "ifconfig bridge0 down", waited for
R>the TCP connection to close "netstat --a" then "ifconfig bridge0 up".
R> Once done, I
R> # umount /mnt
R> # vmstat -m | fgrep -i rpc
R> and say a somewhat larger allocation count
R> 
R> The allocation count only goes up if I do the network partitioning
R> and only on the NFS client side.
R> 
R> Since the leak is slow and only happens when the TCP connection
R> breaks, I do not think it is a show stopper and one of us can track it down
R> someday.

I reproduced the recipe and find two problems, but don't have a final solution
for either.

First, when we create backchannel in sys/fs/nfs/nfs_commonkrpc.c
newnfs_connect():

if (nfs_numnfscbd > 0) {
nfs_numnfscbd++;
NFSD_UNLOCK();
xprt = svc_vc_create_backchannel(
nfscbd_pool);
CLNT_CONTROL(client, CLSET_BACKCHANNEL,
xprt);
NFSD_LOCK();
nfs_numnfscbd--;
if (nfs_numnfscbd == 0)
wakeup(&nfs_numnfscbd);
}


The svc_vc_create_backchannel() creates xprt with refcount=1. Then we link it
to our client, CLNT_CONTROL() makes refcount=2. Then this functions forgets the
pointer (but it owns refcount). Whenever the client is destroyed, it will do
SVC_RELEASE() on the backchannel, refcount goes 2 to 1 and it is leaked.

I made a patch against that and it does reduce amount of M_RPC leaked after
your recipe, but once I got panic where the backchannel xprt is actually used
after free.  So looks like there is a case that my patch doesn't cover.  What
drives me crazy, can't reproduce it for the second time and can't get any clue
from the single core. Patch attached.

Second, with the patch the M_RPC leak count for me is 2. And I found that these
two items are basically is a clnt_vc that belongs to a closed connection:

f80029614a80 tcp4   0  0 10.6.6.9.772   10.6.6.9.2049  
CLOSED 

There is no connection peer connection, as the server received a timeout trying
to send. But rpc.tlsclntd doesn't try to send anything on the socket, it just
keeps it select(2) fd set and doesn't garbage collect.

So it is a bigger resource leak than just two pieces of M_RPC. I don't think
this is related to my changes.

-- 
Gleb Smirnoff
commit 1ca54035f1ab46884675fdbba7364624e1217c9e
Author: Gleb Smirnoff 
Date:   Tue Jan 28 20:13:19 2025 -0800

try

diff --git a/sys/fs/nfs/nfs_commonkrpc.c b/sys/fs/nfs/nfs_commonkrpc.c
index e5c658ce76d2..9f3583a09037 100644
--- a/sys/fs/nfs/nfs_commonkrpc.c
+++ b/sys/fs/nfs/nfs_commonkrpc.c
@@ -262,7 +262,6 @@ newnfs_connect(struct nfsmount *nmp, struct nfssockreq *nrp,
 	struct socket *so;
 	int one = 1, retries, error = 0;
 	struct thread *td = curthread;
-	SVCXPRT *xprt;
 	struct timeval timo;
 	uint64_t tval;
 
@@ -430,12 +429,15 @@ newnfs_connect(struct nfsmount *nmp, struct nfssockreq *nrp,
  */
 NFSD_LOCK();
 if (nfs_numnfscbd > 0) {
+	SVCXPRT *xprt;
+
 	nfs_numnfscbd++;
 	NFSD_UNLOCK();
 	xprt = svc_vc_create_backchannel(
 	nfscbd_pool);
 	CLNT_CONTROL(client, CLSET_BACKCHANNEL,
 	xprt);
+	SVC_RELEASE(xprt);
 	NFSD_LOCK();
 	nfs_numnfscbd--;
 	if (nfs_numnfscbd == 0)