Hi Lari,

Thanks for raising a proposal and starting the discussion thread here.
Increasing the thread does not solve the underlying problem.
but also brings additional resource consumption.

Thanks, Matteo provided a great explanation,
and, I totally agree with his explanation.
What should we try to optimize thread usage,
reduce block calls, rather than going in the opposite direction.

For the current REST API implementation,
I saw 2 main problems

1. The API defined we are using the Jetty async mode, but
is actually a blocking call. I think 100% this is bad code,
confuse contributors, and bring potential risks
2. We don't know the exact return type from the interface definition [1]
until reading the implementation code here[2]

So, I think to change the current implementation to purely async
and avoid passing the `AsyncResponse` to the implementation
is the right way to improve the REST API and fix the potential problems.

[1]
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java#L1073
[2]
https://github.com/apache/pulsar/blob/df9a12dc90316e0d5c1d9effb416ec83fbcf9b5e/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java#L1259

Thanks,
Penghui

On Thu, Feb 17, 2022 at 10:00 AM Matteo Merli <matteo.me...@gmail.com>
wrote:

> Hi Lari,
>
> thanks for the proposal, though I don't think it would be good to
> change this default value.
>
> Pulsar HTTP API is an administrative service and it is not on the
> critical data path.
> If all the threads in the pool are busy, it is good to apply back
> pressure and slow down the incoming new requests.
>
> In the past, we had several cases of deadlocks involving HTTP Jetty
> threads. In all these cases, the root cause has been a broker that in
> response to an HTTP request is spawning HTTP requests to other
> brokers, and some of these requests are going to itself.
>
> These kinds of deadlock are unrelated to the number of HTTP threads in
> the pool and must be dealt with by making the underlying operation
> asynchronous.
>
> Finally each user is free to change the default to something that
> suits more, though I really don't think it makes for us to default to
> 200 threads as:
>   1. It will not improve the Pulsar performance
>   2. It will not magically solve any deadlock issue
>   3. It will just consume much more resources for everyone
>
>
> Replying in-line to some of the points.
>
> >
> > Additional context
> >
> > -   Examples of previous issues where Jetty threads have been occupied
> >     and caused problems: #13666 #4756 #10619
>
> All these examples are cases of what I have described above: an HTTP
> handler triggering a call to the same HTTP server resulting in a
> deadlock.
>
> Having had more threads wouldn't have fixed them.
>
> > -   Mailing list thread about “make async” changes:
> >    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l
>
> The rationale for these changes has not much to do with Jetty but
> rather to make sure that we are not mixing blocking and non-blocking
> operations when handling these requests, which is prone to cause
> deadlocks in broker's internal threads (irrespective of any Jetty
> thread pool size).
>
> > Q: What’s the reason of setting the default value to 200? If the node
> just have one core, what will happen?
> >
> > These are threads. Jetty defaults to 200 maximum threads, to prevent
> > thread pool starvation. This is recommended when using blocking Servlet
> > API. The problem is that Pulsar uses the blocking servlet API and
> > doesn’t have a sufficient number of threads which are needed and
> > recommended.
>
> We need to keep in mind that typical usage of Jetty is as a Servlet
> container, dedicated to that. You have servlets doing disk IO reads
> and blocking calls on JDBC drivers.
>
> Pulsar usage is *very* different from that. Admin API is a tiny
> portion of what a broker does. Dedicating 200 threads to that would be
> a massive waste of resources (CPU & memory).
>
> > The value 200 doesn’t mean that there will be 200 threads to start with.
> > This is the maximum size for the thread pool. When the value is more
> > than 8, Jetty will start with 8 initial threads and add more threads to
> > the pool when all threads are occupied.
> >
> > Q: Do we need to take the number of system cores into consideration for
> the maximum threads of the thread pool?
> >
> > No. Jetty is different from Netty in this aspect. In Netty, everything
> > should be asynchronous and “thou shall never block”. In Jetty, the
> > maximum number of threads for the thread pool should be set to 50-500
> > threads and blocking operations are fine.
>
> I disagree. In a container configured with 0.5 CPU cores, it would be
> very bad to run 200 threads, and it will have no other advantage
> whatsoever.
>
> That is the reason why the default thread pool size is pegged to the
> number of cores.
>
> > The recommendation for the thread pool is explained in Jetty
> > documentation
> >
> https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
> > > Thread Pool > Configure with goal of limiting memory usage maximum
> > available. Typically this is >50 and <500
>
> The recommendations for Jetty are relative to typical Jetty workloads,
> something the Pulsar broker is definitely not.
>

Reply via email to