On Fri, 5 May 2023 17:29:30 GMT, Paul Hohensee <p...@openjdk.org> wrote:

>> I agree we should strive to get the value as accurate as possible. I think 
>> for operational use at scale, we need to avoid doing safepoints. Holding a 
>> `ThreadLock` might also penalize other code that (ab)uses threading (we 
>> frequently see thousands of threads coming and going, don't ask).
>> 
>> But I have a fundamental question here: since SMR/TLH gives us a snapshot of 
>> currently live threads, and it also protects us from seeing an exiting 
>> thread in bad state (ultimately, a `delete`-d one), why can't we just trust 
>> its `cooked_allocated_bytes`, and avoid adding allocated bytes on exit path? 
>> If we cannot trust that, can we make it trustable while thread is protected 
>> by SMR/TLH?
>
> I thought about doing those, but both would slow down the app/JVM and very 
> likely introduce p99.9/p100 latency outliers that we'd rather not see just 
> because we're sampling. Also,
> 
> 1. The existing thread allocated bytes implementation isn't particularly 
> accurate either, in the sense that reality quickly gets away from the method 
> return values. That was a conscious decision to go with speed and efficiency 
> over stability/accuracy, so I went the same way for this implementation.
> 2. Cloud services sample at fairly long intervals in order to avoid overhead: 
> 1 minute is common and 5 minutes not unheard of. Stability/accuracy within 
> short time frames is unneeded with such long sampling intervals.

Afaiu, SMR/TLH keeps a terminated thread's TLS accessible, but doesn't stop the 
termination process.

If we initialize result with exited_allocated_bytes, the "too small" 
possibility is still there. We still have a window between that initialization 
and the creation of the threads iterator where threads may terminate and thus 
not be included in the iteration. And new threads may become active during the 
iteration and not be included. As Aleksey observes, an advantage is that 
threads that terminate during the iteration will be counted only once and we 
don't have to check for terminated threads.

If we stick with adding exited_allocated_bytes to result after the iteration, 
Volker is correct that we shouldn't include terminated threads in the sum 
because their allocated bytes values will have been added to 
exited_allocated_bytes. We have the same undercount possibility that 
initializing result with exited_allocated_bytes has, so this (fixed) approach 
can result in "too small" also, but "too large" goes away.

Looks like both approaches are equivalent, so let's go with initializing result 
with exited_allocated_bytes because it avoids a comparison in the loop.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/13814#discussion_r1186508768

Reply via email to