On Sat, Sep 02, 2023 at 11:52:28AM +0100, Martin Pieuchot wrote:
> On 13/08/23(Sun) 22:59, Kurt Miller wrote:
> > I’ve been hunting an intermittent jdk crash on sparc64 for some time now.
> > Since egdb has not been up to the task, I created a small c program which
> > reproduces the problem. This partially mimics the jdk startup where a number
> > of detached threads are created. When each thread is created the main thread
> > waits for it to start and change state. In my test program I then have the 
> > detached thread wait for a condition that will not happen (parked waiting
> > on a condition var).
> > 
> > When the intermittent crash occurs, one of two things happen; a segfault or
> > the process has been killed by the kernel. The segfault cores are similar to
> > what I see with the jdk crashes. It looks like the stack of the thread 
> > creating
> > the threads is corrupted. In this case it is the primordial thread. In the 
> > jdk
> > it is a different thread but its the thread that called pthread_create that
> > has it stack wiped out.
> 
> I have seen similar symptoms on x86 with go & rust when unlocking the
> fault handler.  I wonder if grabbing the KERNEL_LOCK() around uvm_fault()
> in sparc64/trap.c makes the problem disappear...

It does not. I ran the test program with the diff below and I still see
both symptoms of this instability.

Index: sys/arch/sparc64/sparc64/trap.c
===================================================================
RCS file: /cvs/src/sys/arch/sparc64/sparc64/trap.c,v
retrieving revision 1.115
diff -u -p -r1.115 trap.c
--- sys/arch/sparc64/sparc64/trap.c     11 Feb 2023 23:07:28 -0000      1.115
+++ sys/arch/sparc64/sparc64/trap.c     2 Sep 2023 12:16:09 -0000
@@ -957,7 +957,9 @@ text_access_fault(struct trapframe *tf, 
            uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial))
                goto out;
 
+       KERNEL_LOCK();
        error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type);
+       KERNEL_UNLOCK();
 
        /*
         * If this was a stack access we keep track of the maximum
@@ -1051,7 +1053,9 @@ text_access_error(struct trapframe *tf, 
            uvm_map_inentry_sp, p->p_vmspace->vm_map.sserial))
                goto out;
 
+       KERNEL_LOCK();
        error = uvm_fault(&p->p_vmspace->vm_map, va, 0, access_type);
+       KERNEL_UNLOCK();
 
        /*
         * If this was a stack access we keep track of the maximum
@@ -1261,7 +1265,9 @@ copyinsn(struct proc *p, vaddr_t uva, in
        do {
                if (pmap_copyinsn(map->pmap, uva, (uint32_t *)insn) == 0)
                        break;
+               KERNEL_LOCK();
                error = uvm_fault(map, trunc_page(uva), 0, PROT_EXEC);
+               KERNEL_UNLOCK();
        } while (error == 0);
 
        return error;

Reply via email to