Re: NetBSD-10.0/i386 spurious SIGSEGV

Emmanuel Dreyfus Sat, 08 Jun 2024 22:29:05 -0700

On Sat, Jun 08, 2024 at 10:10:58PM -0400, Mouse wrote:
> First thing I'd look at is the userland instruction(s) around the crash
> point, maybe look at instructions starting at 0xbb610480 or something
> and then disassemble forwards looking for 0xbb610579.  In particular,
> I'd be interested in whether it's a store instruction that failed or
> whether this happened during a syscall trap.


   0xbb610570 <__gettimeofday50>:       mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:     int    $0x80
   0xbb610577 <__gettimeofday50+7>:     jb     0xbb61057a <__gettimeofday50+10>
=> 0xbb610579 <__gettimeofday50+9>:     ret  

> Are all the failures in __gettimeofday50?  All in trap-to-the-kernel
> calls?

I have seen many crashes on system call returns. Another one on
__gettimeofday50:

   0xbb610570 <__gettimeofday50>:       mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:     int    $0x80
   0xbb610577 <__gettimeofday50+7>:     jb     0xbb61057a <__gettimeofday50+10>
   0xbb610579 <__gettimeofday50+9>:     ret    
=> 0xbb61057a <__gettimeofday50+10>:    push   %ebx

Another one:
   0xbb610570 <__gettimeofday50>:       mov    $0x1a2,%eax
   0xbb610575 <__gettimeofday50+5>:     int    $0x80
=> 0xbb610577 <__gettimeofday50+7>:     jb     0xbb61057a <__gettimeofday50+10>
   0xbb610579 <__gettimeofday50+9>:     ret  

At once I thought about a stack problem, but I think the last one proves
this is not the case. This one involves no memory access.

> You say "multiple machines"; are those multiple domUs on a single dom0,
> or are they spread across multiple underlying hardware machines? 

It happens on multiple hardware machines and starts on upgrading the 
domU. I even tested moving a domU from one machine to another one 
and the bug folllowed. Other netbsd-9 domU on the same dom0 have
no problem, or at least it is rare enough that I did not notice
for years.

> If the latter, how similar are those underlying machines? 

Same model:
vcpu3: Intel(R) Xeon(R) CPU E3-1220 v6 @ 3.00GHz, id 0x906e9


-- 
Emmanuel Dreyfus
m...@netbsd.org

Re: NetBSD-10.0/i386 spurious SIGSEGV

Reply via email to