January 2025 stabilization week
Hi FreeBSD/main users & developers: This is an automated email to inform you that the January 2025 stabilization week started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as main-stabweek-2025-Jan. Those who want to participate in the stabilization week are encouraged to update to the above revision/tag and test their systems. The tag main-stabweek-2025-Jan has been published at Gleb Smirnoff's github repo. To connect this repo as an additional remote you need to run: git remote add glebius https://github.com/glebius/FreeBSD Once remote is configured, to checkout the tag run: git fetch glebius --tags git checkout main-stabweek-2025-Jan If you want to use only the official FreeBSD repo, then update to the revision: git pull git checkout c6767dc1f236 Developers are encouraged to avoid pushing new features to FreeBSD/main during the stabilization week, but focus on bugfixes instead. The stabilization week runs up to Friday 18:00 UTC, but if there is consensus that any regressions discovered by participants have been fixed, it will end early. Once that happens, the advisory freeze of FreeBSD/main branch is thawed. -- Gleb Smirnoff
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
Hi, sounds like the Alder Lakes PCID bug in N100 flavor. On the small cores the INVLPG instruction is broken, failing to flush all (global?) TLP entries leading to cache corruption. FreeBSD has a work around for that: https://cgit.freebsd.org/src/commit/?id=cde70e312c3fde5b37a29be1dacb7fde9a45b94a However that work around never fully solved the problem on the N100 series. My own N100 board was never stable with PCID enabled and there are several other reports of the same problems. For example https://lists.freebsd.org/archives/freebsd-current/2023-August/004116.html Since Linux went with disabling PCID all together on all Alder Lake and Raptor LAKE CPUs, I did the same by setting vm.pmap.pcid_enabled=0 in loader.conf. Since I did that the system is running fine. The Linux commit disabling PCID is here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae8373a5add4ea39f032563cf12a02946d1e3546 A microcode update might also help. I didn't test the updates released by Intel since early last year so I don't know for sure. Regards, Yamagi Am 27.01.25 um 18:10 schrieb Ian FREISLICH: > I recently bought one of those mini-pc firewall devices (Topton 12th gen > N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg or > buildkernel without getting a slew of these messages, inode number > changing and a panic shortly thereafter. > > kernel: /: bad dir ino 4567815 at offset 0: mangled entry > > I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and 14.2- > RELEASE, both with and without journal, trim and softupdates in every > permitted permutation without success. The system has an NVME, but I > experience the same problem with the install on a microsd and different > known good NVME drive. Each time I had to reinstall because the > filesystem was so corrupted it wouldn't boot after a fsck. > > The system is now running fine with ZFS so I'm wondering if it's > silently corrupting the ZFS or if there's a bug in UFS2 that's tickled > by this CPU. I'll provide any debugging required. > > Ian -- Homepage: https://www.yamagi.org Github: https://github.com/yamagi GPG: 0xeb1472e71d502515 -- Homepage: https://www.yamagi.org Github: https://github.com/yamagi GPG: 0xeb1472e71d502515
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
All, I can confirm that the microcode loaded early fixes the issue. Ian On 2025-01-27 13:12, Patrick M. Hausen wrote: Hi all, Am 27.01.2025 um 18:38 schrieb Milan Obuch : On Mon, 27 Jan 2025 12:10:43 -0500 Ian FREISLICH wrote: I recently bought one of those mini-pc firewall devices (Topton 12th gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg or buildkernel without getting a slew of these messages, inode number changing and a panic shortly thereafter. kernel: /: bad dir ino 4567815 at offset 0: mangled entry [...] Just a "me too" message - I did test another device with the same CPU, mine is SZBOX. [...] In the OPNsense community we had frequent reports of UFS corruption with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of varying manufacture and quality in use, apparently. The problems were fixed in all cases that I am aware of by applying the current Intel microcode update (sysutils/cpu-microcode). Make sure to activate early loading via /boot/loader.conf(.local). HTH, kind regards, Patrick
Re: January 2025 stabilization week
On Mon, Jan 27, 2025 at 01:01:16AM -0800, Gleb Smirnoff wrote: T> This is an automated email to inform you that the January 2025 stabilization week T> started with FreeBSD/main at main-n275044-c6767dc1f236, which was tagged as T> main-stabweek-2025-Jan. Quick status update: 1) No problems were found with a desktop & laptop experience. 2) We discovered a regression, panic with INVARIANTS, for network applications that use socket option SO_REUSEPORT_LB. We are working on the problem. -- Gleb Smirnoff
Re: Difference in "netstat -rn" output in the last 2 months
W dniu 27.01.2025 o 21:07, Michael Gmelin pisze: On Sun, 26 Jan 2025 16:58:57 +0100 Alexander Leidinger wrote: Hi, something has changed in the output of "netstat -rn" between 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some other interface, so it may not be directly related to netstat itself but could be related to some other stuff (netlink maybe?). Does this ring a bell for someone? If there had been "iocage" in the subject, I would've looked into it earlier :) I'll produce a PR on the repo based on the issue you opened and also apply it to the port. Cheers I was also hit by this change in a couple of ways. When 15.0 is released in future, it's probably worth adding information about the change to the relnotes. -- Marek Zarychta
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
On Mon, 27 Jan 2025 12:10:43 -0500 Ian FREISLICH wrote: > I recently bought one of those mini-pc firewall devices (Topton 12th > gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg > or buildkernel without getting a slew of these messages, inode number > changing and a panic shortly thereafter. > > kernel: /: bad dir ino 4567815 at offset 0: mangled entry > > I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and > 14.2-RELEASE, both with and without journal, trim and softupdates in > every permitted permutation without success. The system has an NVME, > but I experience the same problem with the install on a microsd and > different known good NVME drive. Each time I had to reinstall because > the filesystem was so corrupted it wouldn't boot after a fsck. > > The system is now running fine with ZFS so I'm wondering if it's > silently corrupting the ZFS or if there's a bug in UFS2 that's > tickled by this CPU. I'll provide any debugging required. Just a "me too" message - I did test another device with the same CPU, mine is SZBOX. Only with 14.2-RELEASE, but I tested with both NVMe and M.2 SATA devices, both direct in miniPC and externally via USB-NVMe and USB-M.2 SATA converters. System installation went flawless, however, just building ports-mgmt/pkg port was enough to start generate mangled entry messages as you wrote. To me it look like there is some bug in UFS code when used with this CPU, I have no idea how such a bug could not manifest itself on another platform - I was convinced UFS code is hardware independent, this is really strange. I did not test with ZFS yet, I plan to do it. So, if I can test something, patch, different setup, provide some debugging, count me in. Regards, Milan
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
It might be timing related. UFS with a custom kernel (previously GENERIC) is less prone and I got these building world on an microSD: Jan 27 12:35:54 router kernel: /: inode 1286411: check-hash failed Jan 27 12:35:54 router syslogd: last message repeated 1 times Jan 27 12:35:54 router kernel: /: inode 1286412: check-hash failed Jan 27 12:35:54 router syslogd: last message repeated 1 times Jan 27 12:35:54 router kernel: /: inode 1286413: check-hash failed Jan 27 12:35:54 router syslogd: last message repeated 1 times No panic so far, without the cpu microcode. I'll look into that shortly. Ian On 2025-01-27 13:12, Patrick M. Hausen wrote: Hi all, Am 27.01.2025 um 18:38 schrieb Milan Obuch : On Mon, 27 Jan 2025 12:10:43 -0500 Ian FREISLICH wrote: I recently bought one of those mini-pc firewall devices (Topton 12th gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg or buildkernel without getting a slew of these messages, inode number changing and a panic shortly thereafter. kernel: /: bad dir ino 4567815 at offset 0: mangled entry [...] Just a "me too" message - I did test another device with the same CPU, mine is SZBOX. [...] In the OPNsense community we had frequent reports of UFS corruption with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of varying manufacture and quality in use, apparently. The problems were fixed in all cases that I am aware of by applying the current Intel microcode update (sysutils/cpu-microcode). Make sure to activate early loading via /boot/loader.conf(.local). HTH, kind regards, Patrick
UFS bad inode, mangled entry on Alder Lake-N(100)
I recently bought one of those mini-pc firewall devices (Topton 12th gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg or buildkernel without getting a slew of these messages, inode number changing and a panic shortly thereafter. kernel: /: bad dir ino 4567815 at offset 0: mangled entry I tried the FreeBSD-15.0-CURRENT-amd64-20250124 snapshot and 14.2-RELEASE, both with and without journal, trim and softupdates in every permitted permutation without success. The system has an NVME, but I experience the same problem with the install on a microsd and different known good NVME drive. Each time I had to reinstall because the filesystem was so corrupted it wouldn't boot after a fsck. The system is now running fine with ZFS so I'm wondering if it's silently corrupting the ZFS or if there's a bug in UFS2 that's tickled by this CPU. I'll provide any debugging required. Ian
Re: UFS bad inode, mangled entry on Alder Lake-N(100)
Hi all, > Am 27.01.2025 um 18:38 schrieb Milan Obuch : > > On Mon, 27 Jan 2025 12:10:43 -0500 > Ian FREISLICH wrote: > >> I recently bought one of those mini-pc firewall devices (Topton 12th >> gen N100 with 4x I226-V, 2x X520) and couldn't get it to install pkg >> or buildkernel without getting a slew of these messages, inode number >> changing and a panic shortly thereafter. >> >> kernel: /: bad dir ino 4567815 at offset 0: mangled entry >> >> [...] > > Just a "me too" message - I did test another device with the same CPU, > mine is SZBOX. > [...] In the OPNsense community we had frequent reports of UFS corruption with Alder Lake and Raptor Lake CPUs. Lots of embedded devices of varying manufacture and quality in use, apparently. The problems were fixed in all cases that I am aware of by applying the current Intel microcode update (sysutils/cpu-microcode). Make sure to activate early loading via /boot/loader.conf(.local). HTH, kind regards, Patrick
Re: "don't know how to make /usr/main-src/sys/contrib/dev/iwm/iwm-3160-17.fw.uu. Stop"
On Jan 26, 2025, at 20:51, Adrian Chadd wrote: > Hi! Hello. > So, there's no longer a build target for the firmware uuencoded files -> > kernel module. Yea. But there are the sys/conf/files dependency lines in main that still list .fw.uu files. That includes a reference related to the error I get in my context unless I avoid "device iwmfw" in the kernel configuration: /. . ./sys/conf/files: dependency "$S/contrib/dev/iwm/iwm-3160-17.fw.uu" \ It makes things look like the .fw.uu removal activity is still incomplete. > Being able to build iwm in the kernel rather than a module is broken. > > Now, the real issue(s) are that iwm needs firmware to initialise, and the > firmware needs to exist, and thus it needs access to the rootfs for > firmware_get() to find the now binary files in /boot/firmware instead of the > kernel module old way, and that whole pipeline is broken if it's loaded at > boot time or included in the kernel directly. There isn't a nice way to defer > the firmware load attempt until /after/ rootfs is up. > Yep. === Mark Millard marklmi at yahoo.com
Re: Difference in "netstat -rn" output in the last 2 months
Am 2025-01-26 16:58, schrieb Alexander Leidinger: Hi, something has changed in the output of "netstat -rn" between 2024-11-23-195545 and 2025-01-22-151306. The default route is not listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if it uses netstat or some other interface, so it may not be directly related to netstat itself but could be related to some other stuff (netlink maybe?). For those which stumble upon this, a fix is here: https://github.com/freebsd/iocage/issues/60 Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Difference in "netstat -rn" output in the last 2 months
On Sun, 26 Jan 2025 16:58:57 +0100 Alexander Leidinger wrote: > Hi, > > something has changed in the output of "netstat -rn" between > 2024-11-23-195545 and 2025-01-22-151306. The default route is not > listed as "default" anymore, but with "0.0.0.0" resp. "::/0". This > breaks some tools (e.g. iocage). Iocage uses python, I'm not sure if > it uses netstat or some other interface, so it may not be directly > related to netstat itself but could be related to some other stuff > (netlink maybe?). > > Does this ring a bell for someone? > If there had been "iocage" in the subject, I would've looked into it earlier :) I'll produce a PR on the repo based on the issue you opened and also apply it to the port. Cheers -- Michael Gmelin
Re: HEADS UP: NFS changes coming into CURRENT early February
On Tue, Jan 21, 2025 at 10:27 PM Gleb Smirnoff wrote: > > CAUTION: This email originated from outside of the University of Guelph. Do > not click links or open attachments unless you recognize the sender and know > the content is safe. If in doubt, forward suspicious emails to > ith...@uoguelph.ca. > > > Hi, > > TLDR version: > users of NFS with Kerberos (e.g. running gssd(8)) as well as users of NFS with > TLS (e.g. running rpc.tlsclntd(8) or rpc.tlsservd(8)) as well as users of > network lock manager (e.g. having 'options NFSLOCKD' and running rpcbind(8)) > are affected. You would need to recompile & reinstall both the world and the > kernel together. Of course this is what you'd normally do when you track > FreeBSD CURRENT, but better be warned. I will post hashes of the specific > revisions that break API/ABI when they are pushed. > > Longer version: > last year I tried to check-in a new implementation of unix(4) SOCK_STREAM and > SOCK_SEQPACKET in d80a97def9a1, but was forced to back it out due to several > kernel side abusers of a unix(4) socket. The most difficult ones are the NFS > related RPC services, that act as RPC clients talking to an RPC servers in > userland. Since it is impossible to fully emulate a userland process > connection to a unix(4) socket they need to work with the socket internal > structures bypassing all the normal KPIs and conventions. Of course they > didn't tolerate the new implementation that totally eliminated intermediate > buffer on the sending side. > > While the original motivation for the upcoming changes is the fact that I want > to go forward with the new unix/stream and unix/seqpacket, I also tried to > make > kernel to userland RPC better. You judge if I succeeded or not :) Here are > some highlights: > > - Code footprint both in kernel clients and in userland daemons is reduced. > Example: gssd:1 file changed, 5 insertions(+), 64 deletions(-) >kgssapi: 1 file changed, 26 insertions(+), 78 deletions(-) > 4 files changed, 1 insertion(+), 11 deletions(-) > - You can easily see all RPC calls from kernel to userland with genl(1): > # genl monitor rpcnl > - The new transport is multithreaded in kernel by default, so kernel clients > can send a bunch of RPCs without any serialization and if the userland > figures out how to parallelize their execution, such parallelization would > happen. Note: new rpc.tlsservd(8) will use threads. > - One ad-hoc single program syscall is removed - gssd_syscall. Note: > rpctls syscall remains, but I have some ideas on how to improve that, too. > Not at this step though. > - All sleeps of kernel RPC calls are now in single place, and they all have > timeouts. I believe NFS services are now much more resilient to hangs. > A deadlock when NFS kernel thread is blocked on unix socket buffer, and > the socket can't go away because its application is blocked in some other > syscall is no longer possible. > > The code is posted on phabricator, reviews D48547 through D48552. > Reviewers are very welcome! > > I share my branch on Github. It is usually rebased on today's CURRENT: > > https://github.com/glebius/FreeBSD/commits/gss-netlink/ > > Early testers are very welcome! I think I've found a memory leak, but it shouldn't be a show stopper. What I did on the NFS client side is: # vmstat -m | fgrep -i rpc # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt # ls --lR /mnt --> Then I network partitioned it from the server a few times, until the TCP connection closed. (My client is in bhyve and the server on the system the bhyve instance is running in. I just "ifconfig bridge0 down", waited for the TCP connection to close "netstat --a" then "ifconfig bridge0 up". Once done, I # umount /mnt # vmstat -m | fgrep -i rpc and say a somewhat larger allocation count The allocation count only goes up if I do the network partitioning and only on the NFS client side. Since the leak is slow and only happens when the TCP connection breaks, I do not think it is a show stopper and one of us can track it down someday. Other than that, I have not found any problems that you had not already fixed, rick > > -- > Gleb Smirnoff >