Thank you for the detailed response (I can't claim to understand completely, of course.). A saved kernel from 9.99.68 still lets me work with the machine as before; I updated it yesterday and got another - perhaps identical - panic when downloading mail with Thunderbird
panic: fpudna from userland, ip 0x7c16e87b95ca, trapframe 0xffffce01527ec000 cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x152 snprintf() at netbsd:snprintf fpu_set_default_cw() at netbsd:fpu_set_default_cw cpu0: End traceback... dumping to dev 168,15 (offset=8, size=4152523): dump autoconfiguration error: ahcisata0 port 3: clearing WDCTL_RST failed for drive 0 WARNING: negative runtime; monotonic clock has gone backwards wddump: device timed out i/o error rebooting... (and no core dump, of course). On Sat, 4 Jul 2020 at 21:19, Taylor R Campbell <[email protected]> wrote: > > > Date: Thu, 2 Jul 2020 23:09:16 +0100 > > From: Chavdar Ivanov <[email protected]> > > > > On amd64 9.99.69 from yesterday I get: > > [...] > > System panicked: fpudna from kernel, ip 0xffffffff802292af, trapframe > > 0xffffbe013c564a50 > > [...] > > Xtrap07() at Xtrap07+0xbd > > aesni_enc_impl() at aesni_enc_impl+0x1c > > rijndaelEncrypt() at rijndaelEncrypt+0x4b > > ccmp_init_blocks() at ccmp_init_blocks+0xe8 > > [...] > > I am investigating. There must be a bug somewhere in the x86 vector > register state management I used to used to allow the kernel to use > AES-NI, but I'm not yet sure what it is. > > > My WiFi link (iwm) is also visibly slower than usual. .. > > happened while I was running 'pkgin upgrade' over an NFS mount through > > the iwm adapter. > > This is likely an unintended side effect of my recent AES rework > (https://mail-index.netbsd.org/tech-kern/2020/06/18/msg026505.html). > > For systems where we can take advantage of hardware AES support, like > yours, after every call into the AES subsystem, the kernel will zero > the vector registers to avoid leaking secrets through Spectre-class > speculative execution attacks. > > Although your kernel is evidently now taking advantage of hardware > support for AES (the x86 AES-NI CPU instructions), which is much > faster than software AES, the logic in our 802.11 stack to compute > CCMP (the authenticated cipher used in your WPA setup) calls the AES > block cipher one block at a time. > > So it's zeroing all the vector registers for every 16 bytes of data in > every frame -- twice, because AES-CCM involves two block cipher calls > for every block of data (one for the AES-CBC-MAC authenticator, one > for the AES-CTR encryption pad). I expect this is the source of the > slowdown you're witnessing. > > > There are a few ways we could work around this: > > 1. Push the AES-CCM computation into the AES subsystem, so we only > zero the vector registers once per frame, or once per mbuf segment. > This requires a bit of work but if I can find CCMP test vectors > then it shouldn't be too hard. At worst, it will require redoing > when the wifi branch is merged. > > 2. Push ieee80211_crypto_* into a worker thread, and use > <https://mail-index.netbsd.org/tech-kern/2020/06/20/msg026524.html> > to avoid zeroing the vector registers. However, this may require > some design changes in the 802.11 stack and it's not clear that > they're the right changes or that this can be done quickly. > > 3. Invent a new nestable transaction mechanism to defer zeroing the > vector registers. However, there might also be a penalty to > enabling or disabling the fpu, so it might not solve the whole > problem, and it is not entirely clear what it should mean in an MI > context. > > Another approach, of course, is to simply use an open wifi network > instead -- generally hop-by-hop authenticated encryption like WPA is > not worth much compared to end-to-end authenticated encryption like > TLS, SSH, or Wireguard. Chavdar -- ----
