If I dive into the exact details of the cause of the performance implications in current Admin HTTP API:
Do you think the root cause of the performance is the Jersey implementation of `AsyncResponse.resume(stats)` which takes a thread from a thread pool, serialize the object and then performs a blocking I/O write of the JSON string? If I compared that with Netty HTTP, it would write the same String using async I/O and not blocking a thread. Given the normally large response size of those objects, the response headers or request headers are negligible in terms of performance impact. In terms of accepting connection, both Netty and Jetty has async IO implementation. Compared Jetty and Jersey with Netty based binary TCP, if we end up writing a JSON string, the only difference I see is the blocking I/O of writing the response. WDYT? On Fri, May 12, 2023 at 7:29 AM Rajan Dhabalia <rdhaba...@apache.org> wrote: > Communicating over binary protocol is more scalable and performant than > HTTP. Admin API over http has a long history of bottleneck and performance > issues which could also sometimes be a bottleneck for lookup requests and > that was the reason we introduced lookup over binary protocol as well. We > have multiple usecases which require fetching stats with relatively higher > rate and definitely we would like to avoid it over http which could be a > bottleneck for those applications or could be for others. > This PIP doesn't mention security so, let's not misinterpret the usecases. > Sometimes, pulsar is deployed behind the proxy and potentially used SNI > routing proxy which can't be used as http proxy and we would like to let > users access stats for a given topic using the same broker-service url > rather than having separate http endpoints. So, this api addresses scale, > performance, and use accessibility in pulsar. > > Thanks, > Rajan > > On Thu, May 11, 2023 at 6:24 AM Asaf Mesika <asaf.mes...@gmail.com> wrote: > > > Before I dive into the PIP, I have several questions on the background > > provided below: > > > > > > On Tue, May 9, 2023 at 9:08 AM Rajan Dhabalia <rdhaba...@apache.org> > > wrote: > > > > > Hi, > > > > > > Right now, Pulsar provides the topic's stats and stats-internal over > HTTP > > > admin API, and this stats data is used by user applications and also by > > > Pulsar internal components such as Pulsar-functions to derive the > certain > > > states of the applications. > > > for example, there are use cases where the application wants to check > the > > > topic's backlog, subscription's state (readPosition, list of > > > subscriptions), numberOfEntriesSinceFirstNotAckedMessage, etc to > > bootstrap > > > the application or handle the application’s resiliency and state > > > dynamically. Applications can retrieve this stats information by using > > the > > > broker’s admin HTTP APIs. > > > > > > However, stats retrieval over HTTP API doesn’t work well in use cases > > when > > > users would like to access this API at a higher scale when a large > number > > > of application nodes would like to use it over HTTP which could > overload > > > brokers and sometimes makes broker irresponsive and impact admin API > > > performance. It also becomes difficult when Pulsar is deployed in the > > cloud > > > behind the SNI proxy and applications also want to access large-scale > > stats > > > information periodically over different HTTP ports. Instead it would be > > > better if applications can fetch stats over on the same binary protocol > > for > > > scalability and accessibility reasons. > > > > > > > Why do you think using a binary protocol over HTTP would make more > > performant to respond to multiple calls at once? > > Same question but for the security issue - why do you think the HTTP port > > of admin API is harder to access than the binary protocol port? > > > > > > > > > > > > > > Therefore, there are multiple use cases where producer/consumer > > > applications need stats information for topics using the client library > > > over binary protocol. Hence, this PIP introduces client API for > producers > > > and consumers to access topic stats/internal-stats information which > can > > be > > > used by applications as needed. > > > > > > Please visit and review the PIP: > > > https://github.com/apache/pulsar/issues/20265 > > > > > > > > > Thanks, > > > > > > Rajan > > > > > >