On Sat, Sep 02, 2023 at 11:52:28AM +0100, Martin Pieuchot wrote: > On 13/08/23(Sun) 22:59, Kurt Miller wrote: > > I’ve been hunting an intermittent jdk crash on sparc64 for some time now. > > Since egdb has not been up to the task, I created a small c program which > > reproduces the problem. This partially mimics the jdk startup where a number > > of detached threads are created. When each thread is created the main thread > > waits for it to start and change state. In my test program I then have the > > detached thread wait for a condition that will not happen (parked waiting > > on a condition var). > > > > When the intermittent crash occurs, one of two things happen; a segfault or > > the process has been killed by the kernel. The segfault cores are similar to > > what I see with the jdk crashes. It looks like the stack of the thread > > creating > > the threads is corrupted. In this case it is the primordial thread. In the > > jdk > > it is a different thread but its the thread that called pthread_create that > > has it stack wiped out. > > I have seen similar symptoms on x86 with go & rust when unlocking the > fault handler. I wonder if grabbing the KERNEL_LOCK() around uvm_fault() > in sparc64/trap.c makes the problem disappear...
It does not. I ran the test program with the diff below and I still see both symptoms of this instability. Index: sys/arch/sparc64/sparc64/trap.c =================================================================== RCS file: /cvs/src/sys/arch/sparc64/sparc64/trap.c,v retrieving revision 1.115 diff -u -p -r1.115 trap.c --- sys/arch/sparc64/sparc64/trap.c 11 Feb 2023 23:07:28 -0000 1.115 +++ sys/arch/sparc64/sparc64/trap.c 2 Sep 2023 12:16:09 -0000 @@ -957,7 +957,9 @@ text_access_fault(struct trapframe *tf, uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) goto out; + KERNEL_LOCK(); error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type); + KERNEL_UNLOCK(); /* * If this was a stack access we keep track of the maximum @@ -1051,7 +1053,9 @@ text_access_error(struct trapframe *tf, uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial)) goto out; + KERNEL_LOCK(); error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type); + KERNEL_UNLOCK(); /* * If this was a stack access we keep track of the maximum @@ -1261,7 +1265,9 @@ copyinsn(struct proc *p, vaddr_t uva, in do { if (pmap_copyinsn(map->pmap, uva, (uint32_t *)insn) == 0) break; + KERNEL_LOCK(); error = uvm_fault(map, trunc_page(uva), 0, PROT_EXEC); + KERNEL_UNLOCK(); } while (error == 0); return error;