Il Lun 2 Gen 2023, 14:32 Lari Hotari <lhot...@apache.org> ha scritto:

> This is an interesting proposal. However, I'd suggest changes to the
> current proposal.
>
> I think that the current proposal is too invasive for the Pulsar code
> base. "Introduce thread monitor to check if thread is blocking for long
> time." seems to mean multiple things.
> When looking at the PR, it seems to be a solution for detecting long
> running tasks. Just FYI, that Bookkeeper has a solution for this in it's
> OrderedExecutor with a setting called enableTaskExecutionStats=true . I'm
> not saying that it would be the preferred way to implement it.
>
> If the goal is to detect actual blocking code that is run with threads
> that should run only non-blocking code, there's a better tool called
> Reactor BlockHound (https://github.com/reactor/BlockHound) for that
> purpose.
> For actual profiling of the code base, Java Flight Recorder and Async
> Profiler are better solutions.
>
> It seems that one part of the problem is that there aren't metrics for the
> thread pools. As an alternative implementation for the proposed PIP-232,
> I'd suggest that basic metrics (backlog / queue size, active thread count,
> number of executed tasks, etc)  are added for the thread pools. For
> example, Micrometer contains a decorator for many thread pool
> implementations,
> https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/jvm/ExecutorServiceMetrics.java
> . A similar solution would be very useful in Pulsar for adding the thread
> pool metrics.
>
> Tracking individual tasks requires more resources, and that's why I'd
> suggest adding the basic metrics and making them enabled by default. Some
> more advanced metrics would be useful, such as tracking the thread pool
> queue waiting time. Adding a low overhead thread pool queue waiting time
> could be done with a sampling approach. The benefit of that is that there
> won't be a need to wrap all tasks that are executed. There would be several
> ways to implement the queue waiting time metric.
>
> I assume that "blocking" itself might not be the problem and therefore
> having basic metrics (backlog size, active threads, executed tasks counter,
> failed tasks counter) for the thread pools is more essential.


I agree that we should start with these this low overhead metrics.
I have been spending much time in analyzing head dumps in order to see the
queues of the threadpools :)

Enrico


There's a lot of good things about the PIP-232 proposal and I believe that
> iterating on the ideas will propose a good outcome.
>
> -Lari
>
> On 2022/12/19 12:17:09 adobewjl wrote:
> > Hello pulsar community,
> > I've opened `PIP-232: Introduce thread monitor to check if thread is
> blocked for long time.` to discuss.
> > For more details, please read the PIP at
> https://github.com/apache/pulsar/issues/18985
> > I'm looking forward to hearing what you think.
> > Also the demo PR link at https://github.com/apache/pulsar/pull/18958
>

Reply via email to