On Wed, 4 Jun 2025 04:50:56 GMT, David Holmes <dhol...@openjdk.org> wrote:

>>> I wonder if <Field type="Thread" name="eventThread" label="Thread" /> is 
>>> needed, instead of thread = true?
>> 
>> We had these discussions before on the old PR and then decided to end up 
>> with eventThread (as the other events do to),
>
> @parttimenerd  I would really like to see some kind of design description for 
> this which explains what the threading model is, how the signals are used, 
> and how all the pieces interact. Thanks

@dholmes-ora I attempt a first version here:

The design consists of four main parts:
- setup code: This sets up the signal handlers for every new thread and deletes 
them afterwards
- the per-thread signal handlers: They check first that the current thread is 
valid, increment that they are currently active and check that they shouldn't 
stop (because the profiler is disabled). Now they acquire the thread-local 
enqueue lock for the current thread's request queue and push the sampling 
requests in (see https://openjdk.org/jeps/518 + the current period). It 
triggers/arms a safepoint. If the current thread is in native, they trigger 
(set a flag) the asynchronous stackwalking. This prevents long native periods 
of overflowing the request queue. Finally, the enqueue lock is released.
- the safepoint handler: In the safepoint handler, we check if the thread-local 
queue is not empty. If so, we acquire a dequeue lock and process all entries of 
the queue, thereby creating JFR events. We also untrigger the 
async-stack-walking request for the thread. We then release the lock.
- the sampler thread: Its task is to regularly update the timers if needed 
(configuration changes) and to walk the thread list to find any task that wants 
to be asynchronously stack-walked. For every of these threads, the dequeue lock 
is acquired (skipping if already set to enqueue) and the queue is processed as 
at the safepoint. Then the lock is released.

On shutdown: Whenever the sampler is shut down, we first set the 
`_stop_signals` flag to prevent new signal handlers from entering the request 
creation code (and thereby accessing data structures that we already 
deallocated), we disable the timers for all threads and then wait till no 
signal handler is engaged anymore.

It is important to note that there is only one thread-local lock used, but it 
has three states:
- enqueue
- dequeue
- unlocked

This prevents these phases from overlapping.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25302#issuecomment-2938677600

Reply via email to