userland is active during late shutdown or even panic?

2024-11-03 Thread Andriy Gapon



Has anyone here noticed weird behavior where userland on can be alive and well 
during quite late shutdown phases?


First, I noticed this report on FreeBSD forums.
Initially, I didn't find it believable, but the poster provided quite strong 
evidence and details.

https://forums.freebsd.org/threads/zombie-kernel-reporting-its-own-crash-via-network.95217/

Then recently, I first hand experienced a similar thing during a normal reboot 
of a system.  Here is console output from an ssh session connected to a host 
name 'ryth':

~
29759/usr/home/avg
pts/1:[0]:ryth#18:28>poweroff ; exit
Shutdown NOW!
poweroff: [pid 14930]

*** FINAL System shutdown message from root@ryth ***

System going down IMMEDIATELY

System shutdown time has arrived

[root shell exited, but user shell is still active on ryth]

28923/usr/home/avg
pts/1:[0]:ryth-18:28>tail -30 /var/log/messages
[snip]
Nov  3 18:28:25 ryth shutdown[14930]: power-down by root:
Nov  3 18:28:25 ryth ntpd[2088]: ntpd exiting on signal 15 (Terminated)
Nov  3 18:28:25 ryth upsmon[1989]: upsmon parent: read
Nov  3 18:28:25 ryth kernel: .
Nov  3 18:28:26 ryth kernel: , 1739.
Nov  3 18:28:26 ryth kernel: Waiting (max 60 seconds) for system process `vnlru' 
to stop... done
Nov  3 18:28:26 ryth kernel: Waiting (max 60 seconds) for system process 
`syncer' to stop...

Nov  3 18:28:26 ryth kernel: Syncing disks, vnodes remaining... 0
Nov  3 18:28:27 ryth kernel: 0
28924/usr/home/avg
pts/1:[0]:ryth-18:28>

[finally ssh (slogin) gets disconnected]

Connection to 192.168.0.77 closed by remote host.
Connection to 192.168.0.77 closed.
zsh: exit 255   slogin ryth
~~

It was a weird feeling seeing "Syncing disks, vnodes remaining..." in messages.

--
Andriy Gapon




[Resolved] Re: Panic after main-n273387-bb8b3b174118 -> main-n273419-523913c94371

2024-11-03 Thread David Wolfskill
On Sun, Nov 03, 2024 at 05:21:59AM -0800, Rick Macklem wrote:
> On Sun, Nov 3, 2024 at 5:12 AM David Wolfskill  wrote:
> ...
> > Starting mountd.
> >
> > Fatal trap 12: page fault while in kernel mode
> ...
> > db>
> This is the same issue as the one being discussed under the subject
> "Re: cfbe7a62dc62...".
> 
> rick

And olce@ has posted a patch (in dev-commits-src-main@) which resolved
the issue in my case.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
One can learn a lot about someone by noting what is considered "a joke."

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: Panic after main-n273387-bb8b3b174118 -> main-n273419-523913c94371

2024-11-03 Thread Rick Macklem
On Sun, Nov 3, 2024 at 5:12 AM David Wolfskill  wrote:
>
> Oddly, this is only on my "build machine" (which runs a GENERIC/amd64
> kernel) -- neither of the laptops whined at all.
>
> Copy/paste of the backtrace from serial console:
>
> ...
> Starting mountd.
>
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 42; apic id = 2a
> fault virtual address   = 0x28
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80c3a482
> stack pointer   = 0x28:0xfe01b5f8ea00
> frame pointer   = 0x28:0xfe01b5f8ea80
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 2241 (mountd)
> rdi: f8207a989c00 rsi: 0002 rdx: f8207a086330
> rcx:   r8: f8207906c740  r9: 0002
> rax:  rbx: fe051ce25000 rbp: fe01b5f8ea80
> r10:  r11: 0001 r12: fe01b5f8eb48
> r13: f820dd5a2e00 r14: f8207a989e00 r15: f8207a989e00
> trap number = 12
> panic: page fault
> cpuid = 42
> time = 1730638810
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01b5f8e6d0
> vpanic() at vpanic+0x136/frame 0xfe01b5f8e800
> panic() at panic+0x43/frame 0xfe01b5f8e860
> trap_fatal() at trap_fatal+0x40b/frame 0xfe01b5f8e8c0
> trap_pfault() at trap_pfault+0xa0/frame 0xfe01b5f8e930
> calltrap() at calltrap+0x8/frame 0xfe01b5f8e930
> --- trap 0xc, rip = 0x80c3a482, rsp = 0xfe01b5f8ea00, rbp = 
> 0xfe01b5f8ea80 ---
> vfs_export() at vfs_export+0x7a2/frame 0xfe01b5f8ea80
> vfs_domount_update() at vfs_domount_update+0x7da/frame 0xfe01b5f8ec10
> vfs_domount() at vfs_domount+0x27f/frame 0xfe01b5f8ed40
> vfs_donmount() at vfs_donmount+0x904/frame 0xfe01b5f8edd0
> sys_nmount() at sys_nmount+0x60/frame 0xfe01b5f8ee00
> amd64_syscall() at amd64_syscall+0x158/frame 0xfe01b5f8ef30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe01b5f8ef30
> --- syscall (378, FreeBSD ELF64, nmount), rip = 0x27c44b7fe0aa, rsp = 
> 0x27c449448528, rbp = 0x27c449449090 ---
> KDB: enter: panic
> [ thread pid 2241 tid 102819 ]
> Stopped at  kdb_enter+0x33: movq$0,0x1056222(%rip)
> db>
This is the same issue as the one being discussed under the subject
"Re: cfbe7a62dc62...".

rick

>
>
> One possibly-salient point about the build machine -- it's 32 cores;
> 64 threads.  Which is considerably more potential for multiprocessing
> than the laptops.  It's also an AMD (Epyc) CPU (vs. the Intel CPUs
> in the laptops).
>
> The machine would normally be powered off for the rest of the day
> after this reboot, so if there's something to be gained by poking
> at it, I will happily accept clues & report results.
>
> For additional information (such as a copy of dmesg.boot from yesterday),
> please see the "freebeast head" links at
> https://www.catwhisker.org/~david/FreeBSD/history/
>
> Thanks!
>
> Peace,
> david
> --
> David H. Wolfskill  da...@catwhisker.org
> One can learn a lot about someone by noting what is considered "a joke."
>
> See https://www.catwhisker.org/~david/publickey.gpg for my public key.



Panic after main-n273387-bb8b3b174118 -> main-n273419-523913c94371

2024-11-03 Thread David Wolfskill
Oddly, this is only on my "build machine" (which runs a GENERIC/amd64
kernel) -- neither of the laptops whined at all.

Copy/paste of the backtrace from serial console:

...
Starting mountd.


Fatal trap 12: page fault while in kernel mode
cpuid = 42; apic id = 2a
fault virtual address   = 0x28
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80c3a482
stack pointer   = 0x28:0xfe01b5f8ea00
frame pointer   = 0x28:0xfe01b5f8ea80
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 2241 (mountd)
rdi: f8207a989c00 rsi: 0002 rdx: f8207a086330
rcx:   r8: f8207906c740  r9: 0002
rax:  rbx: fe051ce25000 rbp: fe01b5f8ea80
r10:  r11: 0001 r12: fe01b5f8eb48
r13: f820dd5a2e00 r14: f8207a989e00 r15: f8207a989e00
trap number = 12
panic: page fault
cpuid = 42
time = 1730638810
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01b5f8e6d0
vpanic() at vpanic+0x136/frame 0xfe01b5f8e800
panic() at panic+0x43/frame 0xfe01b5f8e860
trap_fatal() at trap_fatal+0x40b/frame 0xfe01b5f8e8c0
trap_pfault() at trap_pfault+0xa0/frame 0xfe01b5f8e930
calltrap() at calltrap+0x8/frame 0xfe01b5f8e930
--- trap 0xc, rip = 0x80c3a482, rsp = 0xfe01b5f8ea00, rbp = 
0xfe01b5f8ea80 ---
vfs_export() at vfs_export+0x7a2/frame 0xfe01b5f8ea80
vfs_domount_update() at vfs_domount_update+0x7da/frame 0xfe01b5f8ec10
vfs_domount() at vfs_domount+0x27f/frame 0xfe01b5f8ed40
vfs_donmount() at vfs_donmount+0x904/frame 0xfe01b5f8edd0
sys_nmount() at sys_nmount+0x60/frame 0xfe01b5f8ee00
amd64_syscall() at amd64_syscall+0x158/frame 0xfe01b5f8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe01b5f8ef30
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x27c44b7fe0aa, rsp = 
0x27c449448528, rbp = 0x27c449449090 ---
KDB: enter: panic
[ thread pid 2241 tid 102819 ]
Stopped at  kdb_enter+0x33: movq$0,0x1056222(%rip)
db>


One possibly-salient point about the build machine -- it's 32 cores;
64 threads.  Which is considerably more potential for multiprocessing
than the laptops.  It's also an AMD (Epyc) CPU (vs. the Intel CPUs
in the laptops).

The machine would normally be powered off for the rest of the day
after this reboot, so if there's something to be gained by poking
at it, I will happily accept clues & report results.

For additional information (such as a copy of dmesg.boot from yesterday),
please see the "freebeast head" links at
https://www.catwhisker.org/~david/FreeBSD/history/

Thanks!

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
One can learn a lot about someone by noting what is considered "a joke."

See https://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature