On Fri, 5 May 2023 17:29:30 GMT, Paul Hohensee <p...@openjdk.org> wrote:
>> I agree we should strive to get the value as accurate as possible. I think >> for operational use at scale, we need to avoid doing safepoints. Holding a >> `ThreadLock` might also penalize other code that (ab)uses threading (we >> frequently see thousands of threads coming and going, don't ask). >> >> But I have a fundamental question here: since SMR/TLH gives us a snapshot of >> currently live threads, and it also protects us from seeing an exiting >> thread in bad state (ultimately, a `delete`-d one), why can't we just trust >> its `cooked_allocated_bytes`, and avoid adding allocated bytes on exit path? >> If we cannot trust that, can we make it trustable while thread is protected >> by SMR/TLH? > > I thought about doing those, but both would slow down the app/JVM and very > likely introduce p99.9/p100 latency outliers that we'd rather not see just > because we're sampling. Also, > > 1. The existing thread allocated bytes implementation isn't particularly > accurate either, in the sense that reality quickly gets away from the method > return values. That was a conscious decision to go with speed and efficiency > over stability/accuracy, so I went the same way for this implementation. > 2. Cloud services sample at fairly long intervals in order to avoid overhead: > 1 minute is common and 5 minutes not unheard of. Stability/accuracy within > short time frames is unneeded with such long sampling intervals. Afaiu, SMR/TLH keeps a terminated thread's TLS accessible, but doesn't stop the termination process. If we initialize result with exited_allocated_bytes, the "too small" possibility is still there. We still have a window between that initialization and the creation of the threads iterator where threads may terminate and thus not be included in the iteration. And new threads may become active during the iteration and not be included. As Aleksey observes, an advantage is that threads that terminate during the iteration will be counted only once and we don't have to check for terminated threads. If we stick with adding exited_allocated_bytes to result after the iteration, Volker is correct that we shouldn't include terminated threads in the sum because their allocated bytes values will have been added to exited_allocated_bytes. We have the same undercount possibility that initializing result with exited_allocated_bytes has, so this (fixed) approach can result in "too small" also, but "too large" goes away. Looks like both approaches are equivalent, so let's go with initializing result with exited_allocated_bytes because it avoids a comparison in the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13814#discussion_r1186508768