Well useful fields really depend on the applications and different usecases
require different fields and it can eventually end up covering all.
Therefore, we should try to create a protocol which can transport
stats/internal and client should be able to represent that data to
applications in a standard format which should be similar to admin-api
response to maintain consistency. Therefore, this PIP provides client side
API where applications can retrieve and access stats/internal in the same
format as admin-api response to have seamless experience with generic
client-server protocol.

Thanks,
Rajan

On Mon, May 15, 2023 at 12:25 AM Enrico Olivelli <eolive...@gmail.com>
wrote:

> I understand the need for this PIP and I support it, but I have some
> questions/open points.
>
> I wonder if we should define a smaller subset of the stats/internal
> stats to serve to the clients using this new Request/Response.
> Maybe it is better to serve only the minimal amount of data that is useful.
>
> If we have a well defined list of stats to serve with this purpose we
> can save us from serving the JSON, that is less efficient,
> we can declare them as protobuf fields and deal better with protocol
> changes and client bindings.
>
>
> Enrico
>
> Il giorno lun 15 mag 2023 alle ore 02:35 Rajan Dhabalia
> <rdhaba...@apache.org> ha scritto:
> >
> > There are multiple factors tcp would be better than http when you are
> > considering performance and scale due to lower overhead, faster parsing,
> > more control over connection, congestion control, etc and that could be
> > reasons why we have tcp binary protocol for producers/consumers
> > communication with server and this will be an extension for those
> producers
> > and consumers applications which would like to periodically or on-demand
> > retrieve stats from large number of application nodes and we would like
> to
> > let those producers/consumers retrieve it and use it in application
> without
> > worrying about scaling issues at admin apis.
> >
> > Thanks,
> > Rajan
> >
> > On Sun, May 14, 2023 at 6:47 AM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> >
> > > If I dive into the exact details of the cause of the
> > > performance implications in current Admin HTTP API:
> > >
> > > Do you think the root cause of the performance is the Jersey
> implementation
> > > of `AsyncResponse.resume(stats)` which takes a thread from a thread
> pool,
> > > serialize the object and then performs a blocking I/O write of the JSON
> > > string?
> > >
> > > If I compared that with Netty HTTP, it would write the same String
> using
> > > async I/O and not blocking a thread.
> > >
> > > Given the normally large response size of those objects, the response
> > > headers or request headers are negligible in terms of performance
> impact.
> > >
> > > In terms of accepting connection, both Netty and Jetty has async IO
> > > implementation.
> > >
> > > Compared Jetty and Jersey with Netty based binary TCP, if we end up
> writing
> > > a JSON string, the only difference I see is the blocking I/O of
> writing the
> > > response.
> > >
> > > WDYT?
> > >
> > >
> > > On Fri, May 12, 2023 at 7:29 AM Rajan Dhabalia <rdhaba...@apache.org>
> > > wrote:
> > >
> > > > Communicating over binary protocol is more scalable and performant
> than
> > > > HTTP. Admin API over http has a long history of bottleneck and
> > > performance
> > > > issues which could also sometimes be a bottleneck for lookup
> requests and
> > > > that was the reason we introduced lookup over binary protocol as
> well. We
> > > > have multiple usecases which require fetching stats with relatively
> > > higher
> > > > rate and definitely we would like to avoid it over http which could
> be a
> > > > bottleneck for those applications or could be for others.
> > > > This PIP doesn't mention security so, let's not misinterpret the
> > > usecases.
> > > > Sometimes, pulsar is deployed behind the proxy and potentially used
> SNI
> > > > routing proxy which can't be used as http proxy and we would like to
> let
> > > > users access stats for a given topic using the same broker-service
> url
> > > > rather than having separate http endpoints. So, this api addresses
> scale,
> > > > performance, and use accessibility in pulsar.
> > > >
> > > > Thanks,
> > > > Rajan
> > > >
> > > > On Thu, May 11, 2023 at 6:24 AM Asaf Mesika <asaf.mes...@gmail.com>
> > > wrote:
> > > >
> > > > > Before I dive into the PIP, I have several questions on the
> background
> > > > > provided below:
> > > > >
> > > > >
> > > > > On Tue, May 9, 2023 at 9:08 AM Rajan Dhabalia <
> rdhaba...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Right now, Pulsar provides the topic's stats and stats-internal
> over
> > > > HTTP
> > > > > > admin API, and this stats data is used by user applications and
> also
> > > by
> > > > > > Pulsar internal components such as Pulsar-functions to derive the
> > > > certain
> > > > > > states of the applications.
> > > > > > for example, there are use cases where the application wants to
> check
> > > > the
> > > > > > topic's backlog, subscription's state (readPosition, list of
> > > > > > subscriptions), numberOfEntriesSinceFirstNotAckedMessage, etc to
> > > > > bootstrap
> > > > > > the application or handle the application’s resiliency and state
> > > > > > dynamically. Applications can retrieve this stats information by
> > > using
> > > > > the
> > > > > > broker’s admin HTTP APIs.
> > > > > >
> > > > > > However, stats retrieval over HTTP API doesn’t work well in use
> cases
> > > > > when
> > > > > > users would like to access this API at a higher scale when a
> large
> > > > number
> > > > > > of application nodes would like to use it over HTTP which could
> > > > overload
> > > > > > brokers and sometimes makes broker irresponsive and impact admin
> API
> > > > > > performance. It also becomes difficult when Pulsar is deployed
> in the
> > > > > cloud
> > > > > > behind the SNI proxy and applications also want to access
> large-scale
> > > > > stats
> > > > > > information periodically over different HTTP ports. Instead it
> would
> > > be
> > > > > > better if applications can fetch stats over on the same binary
> > > protocol
> > > > > for
> > > > > > scalability and accessibility reasons.
> > > > > >
> > > > >
> > > > > Why do you think using a binary protocol over HTTP would make more
> > > > > performant to respond to multiple calls at once?
> > > > > Same question but for the security issue - why do you think the
> HTTP
> > > port
> > > > > of admin API is harder to access than the binary protocol port?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Therefore, there are multiple use cases where producer/consumer
> > > > > > applications need stats information for topics using the client
> > > library
> > > > > > over binary protocol. Hence, this PIP introduces client API for
> > > > producers
> > > > > > and consumers to access topic stats/internal-stats information
> which
> > > > can
> > > > > be
> > > > > > used by applications as needed.
> > > > > >
> > > > > > Please visit and review the PIP:
> > > > > > https://github.com/apache/pulsar/issues/20265
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Rajan
> > > > > >
> > > > >
> > > >
> > >
>

Reply via email to