I have no strong objection. Just a few questions.

How do we handle the stats/internal-stats of a partitioned topic?
As I understand, the client side will combine the stats/internal-stats
of the partitions? It's better to clarify in the proposal.

Do we need a compatibility section to clarify the behavior of the new client
pointing to an old broker? What is the exact error that users will get if
the
broker doesn't support the new protocol?

And for the Admin API, we have several supported params

Options for getStats:
- getPreciseBacklog
- subscriptionBacklogSize
- getEarliestTimeInBacklog

Options for getInternalStats:
- metadata

Do we need to add to the binary protocol to make it consistent with the
REST API?
And the params supported in getStats API are not supported in
getInternalStats API.
We'd better have a separate message in the wire protocol.

Looking further, any other APIs that are supported in Admin API should be
considered
to add to binary protocol? I mean, we'd better have a strategy to avoid
moving many
APIs to binary protocol. For example:

- Tenant/Namespace level REST APIs, no
- Topic policies APIs, no
- Topic's internal info, maybe
- Expire messages, no
- Topic's properties, no
- ...

Regards,
Penghui

On Mon, May 15, 2023 at 3:47 PM Rajan Dhabalia <rdhaba...@apache.org> wrote:

> Well useful fields really depend on the applications and different usecases
> require different fields and it can eventually end up covering all.
> Therefore, we should try to create a protocol which can transport
> stats/internal and client should be able to represent that data to
> applications in a standard format which should be similar to admin-api
> response to maintain consistency. Therefore, this PIP provides client side
> API where applications can retrieve and access stats/internal in the same
> format as admin-api response to have seamless experience with generic
> client-server protocol.
>
> Thanks,
> Rajan
>
> On Mon, May 15, 2023 at 12:25 AM Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > I understand the need for this PIP and I support it, but I have some
> > questions/open points.
> >
> > I wonder if we should define a smaller subset of the stats/internal
> > stats to serve to the clients using this new Request/Response.
> > Maybe it is better to serve only the minimal amount of data that is
> useful.
> >
> > If we have a well defined list of stats to serve with this purpose we
> > can save us from serving the JSON, that is less efficient,
> > we can declare them as protobuf fields and deal better with protocol
> > changes and client bindings.
> >
> >
> > Enrico
> >
> > Il giorno lun 15 mag 2023 alle ore 02:35 Rajan Dhabalia
> > <rdhaba...@apache.org> ha scritto:
> > >
> > > There are multiple factors tcp would be better than http when you are
> > > considering performance and scale due to lower overhead, faster
> parsing,
> > > more control over connection, congestion control, etc and that could be
> > > reasons why we have tcp binary protocol for producers/consumers
> > > communication with server and this will be an extension for those
> > producers
> > > and consumers applications which would like to periodically or
> on-demand
> > > retrieve stats from large number of application nodes and we would like
> > to
> > > let those producers/consumers retrieve it and use it in application
> > without
> > > worrying about scaling issues at admin apis.
> > >
> > > Thanks,
> > > Rajan
> > >
> > > On Sun, May 14, 2023 at 6:47 AM Asaf Mesika <asaf.mes...@gmail.com>
> > wrote:
> > >
> > > > If I dive into the exact details of the cause of the
> > > > performance implications in current Admin HTTP API:
> > > >
> > > > Do you think the root cause of the performance is the Jersey
> > implementation
> > > > of `AsyncResponse.resume(stats)` which takes a thread from a thread
> > pool,
> > > > serialize the object and then performs a blocking I/O write of the
> JSON
> > > > string?
> > > >
> > > > If I compared that with Netty HTTP, it would write the same String
> > using
> > > > async I/O and not blocking a thread.
> > > >
> > > > Given the normally large response size of those objects, the response
> > > > headers or request headers are negligible in terms of performance
> > impact.
> > > >
> > > > In terms of accepting connection, both Netty and Jetty has async IO
> > > > implementation.
> > > >
> > > > Compared Jetty and Jersey with Netty based binary TCP, if we end up
> > writing
> > > > a JSON string, the only difference I see is the blocking I/O of
> > writing the
> > > > response.
> > > >
> > > > WDYT?
> > > >
> > > >
> > > > On Fri, May 12, 2023 at 7:29 AM Rajan Dhabalia <rdhaba...@apache.org
> >
> > > > wrote:
> > > >
> > > > > Communicating over binary protocol is more scalable and performant
> > than
> > > > > HTTP. Admin API over http has a long history of bottleneck and
> > > > performance
> > > > > issues which could also sometimes be a bottleneck for lookup
> > requests and
> > > > > that was the reason we introduced lookup over binary protocol as
> > well. We
> > > > > have multiple usecases which require fetching stats with relatively
> > > > higher
> > > > > rate and definitely we would like to avoid it over http which could
> > be a
> > > > > bottleneck for those applications or could be for others.
> > > > > This PIP doesn't mention security so, let's not misinterpret the
> > > > usecases.
> > > > > Sometimes, pulsar is deployed behind the proxy and potentially used
> > SNI
> > > > > routing proxy which can't be used as http proxy and we would like
> to
> > let
> > > > > users access stats for a given topic using the same broker-service
> > url
> > > > > rather than having separate http endpoints. So, this api addresses
> > scale,
> > > > > performance, and use accessibility in pulsar.
> > > > >
> > > > > Thanks,
> > > > > Rajan
> > > > >
> > > > > On Thu, May 11, 2023 at 6:24 AM Asaf Mesika <asaf.mes...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Before I dive into the PIP, I have several questions on the
> > background
> > > > > > provided below:
> > > > > >
> > > > > >
> > > > > > On Tue, May 9, 2023 at 9:08 AM Rajan Dhabalia <
> > rdhaba...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Right now, Pulsar provides the topic's stats and stats-internal
> > over
> > > > > HTTP
> > > > > > > admin API, and this stats data is used by user applications and
> > also
> > > > by
> > > > > > > Pulsar internal components such as Pulsar-functions to derive
> the
> > > > > certain
> > > > > > > states of the applications.
> > > > > > > for example, there are use cases where the application wants to
> > check
> > > > > the
> > > > > > > topic's backlog, subscription's state (readPosition, list of
> > > > > > > subscriptions), numberOfEntriesSinceFirstNotAckedMessage, etc
> to
> > > > > > bootstrap
> > > > > > > the application or handle the application’s resiliency and
> state
> > > > > > > dynamically. Applications can retrieve this stats information
> by
> > > > using
> > > > > > the
> > > > > > > broker’s admin HTTP APIs.
> > > > > > >
> > > > > > > However, stats retrieval over HTTP API doesn’t work well in use
> > cases
> > > > > > when
> > > > > > > users would like to access this API at a higher scale when a
> > large
> > > > > number
> > > > > > > of application nodes would like to use it over HTTP which could
> > > > > overload
> > > > > > > brokers and sometimes makes broker irresponsive and impact
> admin
> > API
> > > > > > > performance. It also becomes difficult when Pulsar is deployed
> > in the
> > > > > > cloud
> > > > > > > behind the SNI proxy and applications also want to access
> > large-scale
> > > > > > stats
> > > > > > > information periodically over different HTTP ports. Instead it
> > would
> > > > be
> > > > > > > better if applications can fetch stats over on the same binary
> > > > protocol
> > > > > > for
> > > > > > > scalability and accessibility reasons.
> > > > > > >
> > > > > >
> > > > > > Why do you think using a binary protocol over HTTP would make
> more
> > > > > > performant to respond to multiple calls at once?
> > > > > > Same question but for the security issue - why do you think the
> > HTTP
> > > > port
> > > > > > of admin API is harder to access than the binary protocol port?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Therefore, there are multiple use cases where producer/consumer
> > > > > > > applications need stats information for topics using the client
> > > > library
> > > > > > > over binary protocol. Hence, this PIP introduces client API for
> > > > > producers
> > > > > > > and consumers to access topic stats/internal-stats information
> > which
> > > > > can
> > > > > > be
> > > > > > > used by applications as needed.
> > > > > > >
> > > > > > > Please visit and review the PIP:
> > > > > > > https://github.com/apache/pulsar/issues/20265
> > > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Rajan
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to