January 2025 stabilization week

2025-01-27 Thread Gleb Smirnoff
  Hi FreeBSD/main users & developers:

This is an automated email to inform you that the January 2025 stabilization 
week
started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as
main-stabweek-2025-Jan.

Those who want to participate in the stabilization week are encouraged to
update to the above revision/tag and test their systems.

The tag main-stabweek-2025-Jan has been published at Gleb Smirnoff's github 
repo.
To connect this repo as an additional remote you need to run:

  git remote add glebius https://github.com/glebius/FreeBSD

Once remote is configured, to checkout the tag run:

  git fetch glebius --tags
  git checkout main-stabweek-2025-Jan

If you want to use only the official FreeBSD repo, then update to
the revision:

  git pull
  git checkout c6767dc1f236

Developers are encouraged to avoid pushing new features to FreeBSD/main during
the stabilization week, but focus on bugfixes instead.  The stabilization week
runs up to Friday 18:00 UTC, but if there is consensus that any regressions
discovered by participants have been fixed, it will end early.

Once that happens, the advisory freeze of FreeBSD/main branch is thawed.

--
Gleb Smirnoff



Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Yamagi

Hi,

sounds like the Alder Lakes PCID bug in N100 flavor. On the small cores 
the INVLPG instruction is broken, failing to flush all (global?) TLP 
entries leading to cache corruption. FreeBSD has a work around for that: 
 https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a


However that work around never fully solved the problem on the N100 
series. My own N100 board was never stable with PCID enabled and there 
are several other reports of the same problems. For example 
https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html


Since Linux went with disabling PCID all together on all Alder Lake and 
Raptor LAKE CPUs, I did the same by setting vm.pmap.pcid_enabled=0 in 
loader.conf. Since I did that the system is running fine.


The Linux commit  disabling PCID is here: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546


A microcode update might also help. I didn't test the updates released 
by Intel since early last year so I don't know for sure.



Regards,
Yamagi


Am 27.01.25 um 18:10 schrieb Ian FREISLICH:
> I recently bought one of those mini-pc firewall devices (Topton 12th gen
> N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg or
> buildkernel without getting a slew of these messages, inode number
> changing and a panic shortly thereafter.
>
> kernel: /: bad dir ino 4567815 at offset 0: mangled entry
>
> I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and 14.2-
> RELEASE, both with and without journal, trim and softupdates in every
> permitted permutation without success. The system has an NVME, but I
> experience the same problem with the install on a microsd and different
> known good NVME drive. Each time I had to reinstall because the
> filesystem was so corrupted it wouldn't boot after a fsck.
>
> The system is now running fine with ZFS so I'm wondering if it's
> silently corrupting the ZFS or if there's a bug in UFS2 that's tickled
> by this CPU. I'll provide any debugging required.
>
> Ian
--
Homepage: https://www.yamagi.org
Github:   https://github.com/yamagi
GPG:  0xeb1472e71d502515


--
Homepage: https://www.yamagi.org
Github:   https://github.com/yamagi
GPG:  0xeb1472e71d502515



Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Ian FREISLICH


  
  
All,

I can confirm that the microcode loaded early fixes the issue.

Ian

On 2025-01-27 13:12, Patrick M. Hausen
  wrote:


  Hi all,


  
Am 27.01.2025 um 18:38 schrieb Milan Obuch :

On Mon, 27 Jan 2025 12:10:43 -0500
Ian FREISLICH  wrote:



  I recently bought one of those mini-pc firewall devices (Topton 12th
gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg
or buildkernel without getting a slew of these messages, inode number
changing and a panic shortly thereafter.

kernel: /: bad dir ino 4567815 at offset 0: mangled entry

[...]



Just a "me too" message - I did test another device with the same CPU,
mine is SZBOX.
[...]

  
  
In the OPNsense community we had frequent reports of UFS corruption
with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of
varying manufacture and quality in use, apparently.

The problems were fixed in all cases that I am aware of by applying the current
Intel microcode update (sysutils/cpu-microcode). Make sure to activate early
loading via /boot/loader.conf(.local).

HTH, kind regards,
Patrick



  




Re: January 2025 stabilization week

2025-01-27 Thread Gleb Smirnoff
On Mon, Jan 27, 2025 at 01:01:16AM -0800, Gleb Smirnoff wrote:
T> This is an automated email to inform you that the January 2025 stabilization 
week
T> started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as
T> main-stabweek-2025-Jan.

Quick status update:

1) No problems were found with a desktop & laptop experience.
2) We discovered a regression, panic with INVARIANTS, for network applications
   that use socket option SO_REUSEPORT_LB.  We are working on the problem.

-- 
Gleb Smirnoff



Re: Difference in "netstat -rn" output in the last 2 months

2025-01-27 Thread Marek Zarychta

W dniu 27.01.2025 o 21:07, Michael Gmelin pisze:


On Sun, 26 Jan 2025 16:58:57 +0100
Alexander Leidinger  wrote:


Hi,

something has changed in the output of "netstat -rn" between
2024-11-23-195545 and 2025-01-22-151306. The default route is not
listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This
breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if
it uses netstat or some other interface, so it may not be directly
related to netstat itself but could be related to some other stuff
(netlink maybe?).

Does this ring a bell for someone?


If there had been "iocage" in the subject, I would've looked into it
earlier :)

I'll produce a PR on the repo based on the issue you opened and also
apply it to the port.

Cheers
I was also hit by this change in a couple of ways. When 15.0 is released 
in future, it's probably worth adding information about the change to 
the relnotes.



--
Marek Zarychta




Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Milan Obuch
On Mon, 27 Jan 2025 12:10:43 -0500
Ian FREISLICH  wrote:

> I recently bought one of those mini-pc firewall devices (Topton 12th
> gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg
> or buildkernel without getting a slew of these messages, inode number
> changing and a panic shortly thereafter.
> 
> kernel: /: bad dir ino 4567815 at offset 0: mangled entry
> 
> I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and
> 14.2-RELEASE, both with and without journal, trim and softupdates in
> every permitted permutation without success. The system has an NVME,
> but I experience the same problem with the install on a microsd and
> different known good NVME drive. Each time I had to reinstall because
> the filesystem was so corrupted it wouldn't boot after a fsck.
> 
> The system is now running fine with ZFS so I'm wondering if it's
> silently corrupting the ZFS or if there's a bug in UFS2 that's
> tickled by this CPU. I'll provide any debugging required.

Just a "me too" message - I did test another device with the same CPU,
mine is SZBOX. Only with 14.2-RELEASE, but I tested with both NVMe and
M.2 SATA devices, both direct in miniPC and externally via USB-NVMe and
USB-M.2 SATA converters. System installation went flawless, however,
just building ports-mgmt/pkg port was enough to start generate mangled
entry messages as you wrote.

To me it look like there is some bug in UFS code when used with this
CPU, I have no idea how such a bug could not manifest itself on another
platform - I was convinced UFS code is hardware independent, this is
really strange. I did not test with ZFS yet, I plan to do it.

So, if I can test something, patch, different setup, provide some
debugging, count me in.

Regards,
Milan



Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Ian FREISLICH


  
  
It might be timing related. UFS with a custom kernel (previously
GENERIC) is less prone and I got these building world on an microSD:

Jan 27 12:35:54 router kernel: /: inode 1286411: check-hash failed
Jan 27 12:35:54 router syslogd: last message repeated 1 times
Jan 27 12:35:54 router kernel: /: inode 1286412: check-hash failed
Jan 27 12:35:54 router syslogd: last message repeated 1 times
Jan 27 12:35:54 router kernel: /: inode 1286413: check-hash failed
Jan 27 12:35:54 router syslogd: last message repeated 1 times

No panic so far, without the cpu microcode. I'll look into that
shortly.

Ian

On 2025-01-27 13:12, Patrick M. Hausen
  wrote:


  Hi all,


  
Am 27.01.2025 um 18:38 schrieb Milan Obuch :

On Mon, 27 Jan 2025 12:10:43 -0500
Ian FREISLICH  wrote:



  I recently bought one of those mini-pc firewall devices (Topton 12th
gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg
or buildkernel without getting a slew of these messages, inode number
changing and a panic shortly thereafter.

kernel: /: bad dir ino 4567815 at offset 0: mangled entry

[...]



Just a "me too" message - I did test another device with the same CPU,
mine is SZBOX.
[...]

  
  
In the OPNsense community we had frequent reports of UFS corruption
with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of
varying manufacture and quality in use, apparently.

The problems were fixed in all cases that I am aware of by applying the current
Intel microcode update (sysutils/cpu-microcode). Make sure to activate early
loading via /boot/loader.conf(.local).

HTH, kind regards,
Patrick



  




UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Ian FREISLICH


  
  
I recently bought one of those mini-pc firewall devices (Topton 12th
gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg
or buildkernel without getting a slew of these messages, inode
number changing and a panic shortly thereafter.

kernel: /: bad dir ino 4567815 at offset 0: mangled entry

I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and
14.2-RELEASE, both with and without journal, trim and softupdates in
every permitted permutation without success. The system has an NVME,
but I experience the same problem with the install on a microsd and
different known good NVME drive. Each time I had to reinstall
because the filesystem was so corrupted it wouldn't boot after a
fsck.

The system is now running fine with ZFS so I'm wondering if it's
silently corrupting the ZFS or if there's a bug in UFS2 that's
tickled by this CPU. I'll provide any debugging required.

Ian
  




Re: UFS bad inode, mangled entry on Alder Lake-N(100)

2025-01-27 Thread Patrick M. Hausen
Hi all,

> Am 27.01.2025 um 18:38 schrieb Milan Obuch :
> 
> On Mon, 27 Jan 2025 12:10:43 -0500
> Ian FREISLICH  wrote:
> 
>> I recently bought one of those mini-pc firewall devices (Topton 12th
>> gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg
>> or buildkernel without getting a slew of these messages, inode number
>> changing and a panic shortly thereafter.
>> 
>> kernel: /: bad dir ino 4567815 at offset 0: mangled entry
>> 
>> [...]
> 
> Just a "me too" message - I did test another device with the same CPU,
> mine is SZBOX.
> [...]

In the OPNsense community we had frequent reports of UFS corruption
with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of
varying manufacture and quality in use, apparently.

The problems were fixed in all cases that I am aware of by applying the current
Intel microcode update (sysutils/cpu-microcode). Make sure to activate early
loading via /boot/loader.conf(.local).

HTH, kind regards,
Patrick


Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"

2025-01-27 Thread Mark Millard
On Jan 26, 2025, at 20:51, Adrian Chadd  wrote:


> Hi!

Hello.

> So, there's no longer a build target for the firmware uuencoded files -> 
> kernel module.

Yea. But there are the sys/conf/files dependency lines in
main that still list .fw.uu files. That includes a reference
related to the error I get in my context unless I avoid
"device iwmfw" in the kernel configuration:

/. . ./sys/conf/files:   dependency  "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" 
\

It makes things look like the .fw.uu removal activity is still
incomplete.

> Being able to build iwm in the kernel rather than a module is broken.
> 
> Now, the real issue(s) are that iwm needs firmware to initialise, and the 
> firmware needs to exist, and thus it needs access to the rootfs for 
> firmware_get() to find the now binary files in /boot/firmware instead of the 
> kernel module old way, and that whole pipeline is broken if it's loaded at 
> boot time or included in the kernel directly. There isn't a nice way to defer 
> the firmware load attempt until /after/ rootfs is up.
> 

Yep.
 
===
Mark Millard
marklmi at yahoo.com




Re: Difference in "netstat -rn" output in the last 2 months

2025-01-27 Thread Alexander Leidinger

Am 2025-01-26 16:58, schrieb Alexander Leidinger:

Hi,

something has changed in the output of "netstat -rn" between 
2024-11-23-195545 and 2025-01-22-151306. The default route is not 
listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This 
breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if it 
uses netstat or some other interface, so it may not be directly related 
to netstat itself but could be related to some other stuff (netlink 
maybe?).


For those which stumble upon this, a fix is here:
https://github.com/freebsd/iocage/issues/60

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Difference in "netstat -rn" output in the last 2 months

2025-01-27 Thread Michael Gmelin



On Sun, 26 Jan 2025 16:58:57 +0100
Alexander Leidinger  wrote:

> Hi,
> 
> something has changed in the output of "netstat -rn" between 
> 2024-11-23-195545 and 2025-01-22-151306. The default route is not
> listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This
> breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if
> it uses netstat or some other interface, so it may not be directly
> related to netstat itself but could be related to some other stuff
> (netlink maybe?).
> 
> Does this ring a bell for someone?
> 

If there had been "iocage" in the subject, I would've looked into it
earlier :)

I'll produce a PR on the repo based on the issue you opened and also
apply it to the port.

Cheers


-- 
Michael Gmelin



Re: HEADS UP: NFS changes coming into CURRENT early February

2025-01-27 Thread Rick Macklem
On Tue, Jan 21, 2025 at 10:27 PM Gleb Smirnoff  wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do 
> not click links or open attachments unless you recognize the sender and know 
> the content is safe. If in doubt, forward suspicious emails to 
> ith...@uoguelph.ca.
>
>
>   Hi,
>
> TLDR version:
> users of NFS with Kerberos (e.g. running gssd(8)) as well as users of NFS with
> TLS (e.g. running rpc.tlsclntd(8) or rpc.tlsservd(8)) as well as users of
> network lock manager (e.g. having 'options NFSLOCKD' and running rpcbind(8))
> are affected.  You would need to recompile & reinstall both the world and the
> kernel together.  Of course this is what you'd normally do when you track
> FreeBSD CURRENT, but better be warned.  I will post hashes of the specific
> revisions that break API/ABI when they are pushed.
>
> Longer version:
> last year I tried to check-in a new implementation of unix(4) SOCK_STREAM and
> SOCK_SEQPACKET in d80a97def9a1, but was forced to back it out due to several
> kernel side abusers of a unix(4) socket.  The most difficult ones are the NFS
> related RPC services, that act as RPC clients talking to an RPC servers in
> userland.  Since it is impossible to fully emulate a userland process
> connection to a unix(4) socket they need to work with the socket internal
> structures bypassing all the normal KPIs and conventions.  Of course they
> didn't tolerate the new implementation that totally eliminated intermediate
> buffer on the sending side.
>
> While the original motivation for the upcoming changes is the fact that I want
> to go forward with the new unix/stream and unix/seqpacket, I also tried to 
> make
> kernel to userland RPC better.  You judge if I succeeded or not :) Here are
> some highlights:
>
> - Code footprint both in kernel clients and in userland daemons is reduced.
>   Example: gssd:1 file changed, 5 insertions(+), 64 deletions(-)
>kgssapi: 1 file changed, 26 insertions(+), 78 deletions(-)
> 4 files changed, 1 insertion(+), 11 deletions(-)
> - You can easily see all RPC calls from kernel to userland with genl(1):
>   # genl monitor rpcnl
> - The new transport is multithreaded in kernel by default, so kernel clients
>   can send a bunch of RPCs without any serialization and if the userland
>   figures out how to parallelize their execution, such parallelization would
>   happen.  Note: new rpc.tlsservd(8) will use threads.
> - One ad-hoc single program syscall is removed - gssd_syscall.  Note:
>   rpctls syscall remains, but I have some ideas on how to improve that, too.
>   Not at this step though.
> - All sleeps of kernel RPC calls are now in single place, and they all have
>   timeouts.  I believe NFS services are now much more resilient to hangs.
>   A deadlock when NFS kernel thread is blocked on unix socket buffer, and
>   the socket can't go away because its application is blocked in some other
>   syscall is no longer possible.
>
> The code is posted on phabricator, reviews D48547 through D48552.
> Reviewers are very welcome!
>
> I share my branch on Github. It is usually rebased on today's CURRENT:
>
> https://github.com/glebius/FreeBSD/commits/gss-netlink/
>
> Early testers are very welcome!
I think I've found a memory leak, but it shouldn't be a show stopper.

What I did on the NFS client side is:
# vmstat -m | fgrep -i rpc
# mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt
# ls --lR /mnt
--> Then I network partitioned it from the server a few times, until
  the TCP connection closed.
  (My client is in bhyve and the server on the system the bhyve
   instance is running in. I just "ifconfig bridge0 down", waited for
   the TCP connection to close "netstat --a" then "ifconfig bridge0 up".
Once done, I
# umount /mnt
# vmstat -m | fgrep -i rpc
and say a somewhat larger allocation count

The allocation count only goes up if I do the network partitioning
and only on the NFS client side.

Since the leak is slow and only happens when the TCP connection
breaks, I do not think it is a show stopper and one of us can track it down
someday.

Other than that, I have not found any problems that you had not already
fixed, rick

>
> --
> Gleb Smirnoff
>