On 28/06/23(Wed) 20:07, Kurt Miller wrote:
> On Jun 28, 2023, at 7:16 AM, Martin Pieuchot <m...@openbsd.org> wrote:
> > 
> > On 28/06/23(Wed) 08:58, Claudio Jeker wrote:
> >> 
> >> I doubt this is a missing wakeup. It is more the system is thrashing and
> >> not making progress. The SIGSTOP causes all threads to park which means
> >> that the thread not busy in its sched_yield() loop will finish its 
> >> operation
> >> and then on SIGCONT progress is possible.
> >> 
> >> I need to recheck your ps output from ddb but I guess one of the threads
> >> is stuck in a different place. That is where we need to look.
> >> It may well be a bad interaction between SCHED_LOCK() and whatever else is
> >> going on.
> > 
> > Or simply a poor userland scheduling based on sched_yield()...
> > 
> > To me it seems there are two bugs in your report:
> > 
> > 1/ a deadlock due to a single rwlock in sysctl(2)
> > 
> > 2/ something unknown in java not making progress and calling
> >  sched_yield() and triggering 1/ 
> > 
> > While 1/ is well understood 2/ isn't.  Why is java not making progress
> > is what should be understood.  Knowing where is the sched_yield() coming
> > from can help.
> 
> Okay. I dug into 2/ and believe I understand what’s happening there.
> The short version is that many threads are calling sched_yield(2) and
> that’s somehow preventing either an mmap or munmap call from completing.

I don't understand how that could happen.  sched_yield() shouldn't
prevent anything from happening.  This syscall doesn't require the
KERNEL_LOCK() nor mmap/munmap(2).

> Java spawns a number of GCTaskThread threads for doing tasks like garbage
> collection. The number of threads depends on the number of physical cpus
> in the machine. My machine as 64, java spawns 43 GCTaskThreads. When there’s
> nothing to do these threads are waiting on a condition variable. When there’s
> work to do it all 43 threads are clamoring to get the work done in a design
> that I find a bit unusual.
> 
> Until all 43 threads have no work to do they all continue to check for more
> work. If there’s none but at least one thread is not done yet it does
> the following; a certain number of hard spins or a sched_yield(2) call
> or a 1ms sleep (via pthread_cond_timedwait(3) configured w/CLOCK_MONOTONIC).
> After either a hard spin, sched_yield or the 1ms sleep it rechecks for more
> work to do. If there still no work it repeats the above until all 43 threads
> have no work to do. 
> 
> The relavant code is here with the defaults for some vars:
> 
> https://github.com/battleblow/jdk8u/blob/jdk8u372-ga/hotspot/src/share/vm/utilities/taskqueue.cpp#L153
> 
>     uintx WorkStealingHardSpins                     = 4096                    
>             {experimental}
>     uintx WorkStealingSleepMillis                   = 1                       
>             {experimental}
>     uintx WorkStealingSpinToYieldRatio              = 10                      
>             {experimental}
>     uintx WorkStealingYieldsBeforeSleep             = 5000                    
>             {experimental}
> 
> What I see when java is stuck in what I was calling run-away state, is
> that one thread (not necessarily a GCTaskThread) is stuck in mmap or
> munmap called via malloc/free and the set of 43 GCTaskThreads are trying
> to finish up a task.  Based on the feedback from you, Claudio
> and Vitaliy, I’m assuming that the frequent calls to sched_yield are 
> preventing the mmap/munmap call from completing. Since malloc made
> the call, the malloc mutex is held and about 20 other threads block
> waiting on the one thread to complete mmap/munmap. In unmodified
> -current the sched_yield calls are sufficient to prevent the mmap/unmap
> call from ever completing leading to 1/.

We're making progress here.  The question is now why/where is the thread
stuck in mmap/munmap...

> I’ve attached two debug sessions of a run-away java process. In each I
> have sorted the thread back traces into the following categories:
> 
> Stuck thread
> Run away threads
> Malloc mutex contention threads
> Appear to be normal condition var or sem wait threads
> 
> I’m currently testing reducing the number of sched_yield calls before
> sleeping 1ms from 5000 down to 10. In some limited testing this appears
> to be a work-around for the mmap/munmap starvation problem. I will
> do a long test to confirm this is a sufficent work-around for the problem.
> 

> Stuck thread (on proc or run state not making progress):
> 
> Thread 35 (thread 559567 of process 32495):
> #0  map (d=0xdc4e3717e0, sz=8192, zero_fill=0) at malloc.c:865

This corresponds to a call to mprotect().  However the thread is still
in userland at this point, so it isn't clear to me if it is stuck or
not.

If it is in kernel, it would be useful to get the kernel trace.  You
could enter ddb(4) and print it from there.   Or, if that works on
sparc64, use btrace(8) replacing XXX below with the corresponding TID:

# btrace -e "profile:hz:100 /tid == XXX/ { print(kstack); }"

> #1  0x000000dbec9a6c18 in omalloc_make_chunks (listnum=<optimized out>, 
> bucket=<optimized out>, d=<optimized out>) at malloc.c:963
> #2  malloc_bytes (f=<optimized out>, size=<optimized out>, d=<optimized out>) 
> at malloc.c:1086
> #3  omalloc (pool=0xdc4e3717e0, sz=16, zero_fill=1, f=0x0) at malloc.c:1290
> #4  0x000000dbec9a79b8 in *_libc_calloc (nmemb=1, size=16) at malloc.c:1894
> #5  0x000000dbec95f670 in *_libc_pthread_cond_init (condp=0xdc8096a430, 
> attr=0xdbd39c0780 <os::Bsd::_condattr>) at rthread_cond.c:35
> #6  0x000000dbd30f98b4 in os::PlatformEvent::park (this=0xdc8096a400, 
> millis=<optimized out>) at os_bsd.cpp:4787
> #7  0x000000dbd30fd038 in os::sleep (thread=<optimized out>, millis=1, 
> interruptible=<optimized out>) at os_bsd.cpp:2883
> #8  0x000000dbd3242d68 in ParallelTaskTerminator::offer_termination 
> (this=0xdc4fefbf10, terminator=0x0) at taskqueue.cpp:226
> #9  0x000000dbd314642c in ParallelTaskTerminator::offer_termination 
> (this=0xdc4fefbf10) at 
> /usr/ports/pobj/jdk-1.8.0.372.b07.1/jdk8u-jdk8u372-b07.1/hotspot/src/share/vm/utilities/taskqueue.hpp:638
> #10 StealMarkingTask::do_it (this=0xdbef694460, manager=0xdc086b6460, 
> which=32) at pcTasks.cpp:221
> #11 0x000000dbd2c8e0cc in GCTaskThread::run (this=0xdc80994000) at 
> gcTaskThread.cpp:141
> #12 0x000000dbd30fbd90 in java_start (thread=0xdc80994000) at os_bsd.cpp:815
> #13 0x000000dc8bc6ea50 in _rthread_start (v=0xdbc7402a38) at rthread.c:96
> #14 0x000000dbec938e8c in __tfork_thread () at 
> /usr/src/lib/libc/arch/sparc64/sys/tfork_thread.S:58
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

[...]

> Stuck thread (on proc or run state not making progress):
> 
> Thread 69 (thread 135059 of process 122):
> #0  0x0000001124e19630 in unmap (d=0x11735a9a50, p=0x112b4fe000, sz=32768, 
> clear=0) at malloc.c:778

This one is still in userland, so it isn't clear what prevents it to
make progress, if anything.

> #1  0x0000001124e1bed8 in ofree (argpool=0x10deac2c80, p=0x112b4fe000, 
> clear=0, check=0, argsz=0) at malloc.c:1561
> #2  0x0000001124e1c580 in *_libc_free (ptr=0x112b4fe000) at malloc.c:1641
> #3  0x000000112c57e0d0 in ChunkPool::free_all_but (n=<optimized out>, 
> this=0x11630421c0) at allocation.cpp:315
> #4  ChunkPool::clean () at allocation.cpp:338
> #5  ChunkPoolCleaner::task (this=0x11630baaf0) at allocation.cpp:367
> #6  0x000000112ce26700 in PeriodicTask::execute_if_pending 
> (delay_time=<optimized out>, this=0xdfdfdfdfdfdfdfdf) at 
> /usr/ports/pobj/jdk-1.8.0.372.b07.1/jdk8u-jdk8u372-b07.1/hotspot/src/share/vm/runtime/task.hpp:86
> #7  PeriodicTask::real_time_tick (delay_time=50) at task.cpp:79
> #8  0x000000112ce4eec8 in WatcherThread::run (this=0x1117bba000) at 
> /usr/ports/pobj/jdk-1.8.0.372.b07.1/jdk8u-jdk8u372-b07.1/hotspot/src/share/vm/runtime/thread.cpp:1365
> #9  0x000000112ccdfd90 in java_start (thread=0x1117bba000) at os_bsd.cpp:815
> #10 0x000000113baf2a50 in _rthread_start (v=0x10e34b6438) at rthread.c:96
> #11 0x0000001124e3c5ac in __tfork_thread () at 
> /usr/src/lib/libc/arch/sparc64/sys/tfork_thread.S:58
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

After looking at your report it isn't clear that the issue is in kernel.
There are multiple threads executing libc's malloc and free codes, none
of which seem stuck in kernel.
Many other threads are waiting for something and it isn't clear what it is
nor what is preventing progress.

Reply via email to