Il Lun 2 Gen 2023, 14:32 Lari Hotari <lhot...@apache.org> ha scritto:
> This is an interesting proposal. However, I'd suggest changes to the > current proposal. > > I think that the current proposal is too invasive for the Pulsar code > base. "Introduce thread monitor to check if thread is blocking for long > time." seems to mean multiple things. > When looking at the PR, it seems to be a solution for detecting long > running tasks. Just FYI, that Bookkeeper has a solution for this in it's > OrderedExecutor with a setting called enableTaskExecutionStats=true . I'm > not saying that it would be the preferred way to implement it. > > If the goal is to detect actual blocking code that is run with threads > that should run only non-blocking code, there's a better tool called > Reactor BlockHound (https://github.com/reactor/BlockHound) for that > purpose. > For actual profiling of the code base, Java Flight Recorder and Async > Profiler are better solutions. > > It seems that one part of the problem is that there aren't metrics for the > thread pools. As an alternative implementation for the proposed PIP-232, > I'd suggest that basic metrics (backlog / queue size, active thread count, > number of executed tasks, etc) are added for the thread pools. For > example, Micrometer contains a decorator for many thread pool > implementations, > https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/ExecutorServiceMetrics.java > . A similar solution would be very useful in Pulsar for adding the thread > pool metrics. > > Tracking individual tasks requires more resources, and that's why I'd > suggest adding the basic metrics and making them enabled by default. Some > more advanced metrics would be useful, such as tracking the thread pool > queue waiting time. Adding a low overhead thread pool queue waiting time > could be done with a sampling approach. The benefit of that is that there > won't be a need to wrap all tasks that are executed. There would be several > ways to implement the queue waiting time metric. > > I assume that "blocking" itself might not be the problem and therefore > having basic metrics (backlog size, active threads, executed tasks counter, > failed tasks counter) for the thread pools is more essential. I agree that we should start with these this low overhead metrics. I have been spending much time in analyzing head dumps in order to see the queues of the threadpools :) Enrico There's a lot of good things about the PIP-232 proposal and I believe that > iterating on the ideas will propose a good outcome. > > -Lari > > On 2022/12/19 12:17:09 adobewjl wrote: > > Hello pulsar community, > > I've opened `PIP-232: Introduce thread monitor to check if thread is > blocked for long time.` to discuss. > > For more details, please read the PIP at > https://github.com/apache/pulsar/issues/18985 > > I'm looking forward to hearing what you think. > > Also the demo PR link at https://github.com/apache/pulsar/pull/18958 >