I support this proposal. I can't understand if this is still a VOTE thread.
If it is a VOTE thread then this is my +1 (binding) Sorry for late reply Enrico Il Gio 3 Ago 2023, 21:43 Rajan Dhabalia <rdhaba...@apache.org> ha scritto: > Hi Penghui. > > I think I tried to answer most of the questions and have enough usecases > for this PIP as you also mentioned in your last email. Users can use this > feature for any additional reasons like security, or convenience depending > on their usecases. I think I have also answered all the concerns and > questions as it's been going for a long time and I would like to see it > moving forward and would like to see any suggestions or feedback for API or > any changes in the proposal. > > Thanks, > Rajan > > On Wed, Aug 2, 2023 at 7:06 PM Rajan Dhabalia <rdhaba...@apache.org> > wrote: > > > >> If so, I think it could be a good reason for introducing binary > > protocol support here. For the security sensitive users like financial > > application. Usually they will try to reduce the dependencies (less > > dependencies, less > > CVEs and the exposed service endpoints. For example, the flink connector > > also have pulsar-admin dependency but some of the users want to remove > it. > > > > Yes, that could be another usecase to have such API available for users. > > So, I would like to bump up this discussion again and see if we have any > > other suggestions or concerns as we have multiple users who need it and > > would like to move forward with this API to serve those usecases. > > > > Thanks, > > Rajan > > > > On Mon, Jun 26, 2023 at 9:03 PM PengHui Li <peng...@apache.org> wrote: > > > >> Hi Rajan, > >> > >> Thanks for the explanation > >> > >> > This feature helps them to avoid multiple different extra > >> efforts > >> > >> If I understand correctly. You want to say users don't want to > >> add the pulsar-admin dependency or the the cluster don't want > >> to expose the REST API to external (not the cluster admin or > >> tenant admin)? > >> > >> If so, I think it could be a good reason for introducing binary > >> protocol support here. For the security sensitive users like financial > >> application. > >> > >> Usually they will try to reduce the dependencies (less dependencies, > less > >> CVEs) > >> and the exposed service endpoints. For example, the flink connector > >> also have pulsar-admin dependency but some of the users want to > >> remove it. > >> > >> I want to say that from the perspective of improving performance, > >> it may not be more convincing than the above reason. > >> > >> Thanks, > >> Penghui > >> > >> > >> On Mon, Jun 26, 2023 at 3:37 PM Rajan Dhabalia <rdhaba...@apache.org> > >> wrote: > >> > >> > > I do not deny that binary protocol has performance advantages. But > >> maybe > >> > the bottleneck is not the protocol level for now. > >> > > >> > Well, sure. We had serious issue with performance in past over http > but > >> > this feature we would mainly like to introduce it now for the > >> applications > >> > to enhance api user accessibility experience where we have these > >> multiple > >> > usecases where applications with large number of topics and high > fanout > >> > consumers would like to fetch stats and stats-internal to retrieve > >> various > >> > metadata for application startup and managing application state based > on > >> > managed-ledge states. You can think of producer/consumer stats api > >> which is > >> > used by many usecases in different scenarios of monitoring or state > >> > management. This feature helps them to avoid multiple different extra > >> > efforts and performance considerations, which helps to give clean and > >> easy > >> > experience for their application. > >> > I am open to hear about alternative of json string and response schema > >> > definition but keeping similar response-schema as admin-api response > for > >> > stats/stats-internal helps to give consistent view of stats across all > >> APIs > >> > and transferring json format helps to skip transformation stats > >> definition > >> > and we don't have to make any wire protocol changes whenever we add > any > >> new > >> > field or state in response which makes this protocol response agnostic > >> and > >> > do not require maintenance for any changes in stats. So, we can > consider > >> > any suggestion but we have multiple usecaes which need this API and we > >> > should implement it in a way where it can provide a consistent > >> definition > >> > view across other APIs without maintenance efforts while making any > stat > >> > response changes. > >> > > >> > Thanks, > >> > Rajan > >> > > >> > On Sat, Jun 24, 2023 at 5:51 PM PengHui Li <peng...@apache.org> > wrote: > >> > > >> > > > >> Main reasons are performance (you will see improvement due to > >> http > >> > > overhead compare to tcp, limited threads on http-server side and > >> > definitely > >> > > performance compare to netty-tcp vs http-jetty server), user > >> feasibility > >> > > (same reason why we have producer/consumer stats using binary > protocol > >> > > where user can get the stats from the same producer/consumer entity > >> > without > >> > > having additional client creation for the stats), scale (allowing > >> large > >> > > number of requests over binary protocol compare to http is more > >> > scalable). > >> > > > >> > > I do not deny that binary protocol has performance advantages. > >> > > But maybe the bottleneck is not the protocol level for now. > >> > > > >> > > IMO, we should have a clear performance expectation for getting the > >> topic > >> > > stats > >> > > If you want to achieve 10k of requests per second. > >> > > Maybe the protocol is not the bottleneck. Of course, if you want to > >> reach > >> > > 100k or > >> > > even millions of requests on a single server per second. > >> > > Exactly, the binary protocol will help. I'm curious whether it is a > >> real > >> > > case. > >> > > Unless you have to query topic stats every second and the broker > owned > >> > > 100k topics. > >> > > > >> > > And I also saw some benchmark results for the HTTP servers. > >> > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/HTTPCOMPONENTS/HttpCoreBenchmark > >> > > It's not refreshed, but it's instructive. Try other HTTP servers > >> should > >> > > also be considered. > >> > > > >> > > The binary protocol for topic stats is an option. But it also has > >> > > disadvantages and negative effects. > >> > > > >> > > - Make the binary protocol more complicated > >> > > - All the clients need to catch up with the new protocol > >> > > > >> > > > >> SNI is layer-4 tcp layer protocol and does not support over the > >> > HTTP. > >> > > > >> > > But you can just have an HTTP proxy to handle the redirect? > >> > > Why SNI is required for getting topic stats? > >> > > I'm not sure about the whole story here. > >> > > > >> > > > Well JSON would be the better option to avoid data conversion and > >> copy > >> > > for > >> > > complex stats/internal-stats complex data-structures at broker and > >> client > >> > > side, maintaining consistency between stats/internal-stats > >> pojo-schema, > >> > and > >> > > accessing stats with consistent pojo definition. > >> > > > >> > > This is from the perspective of developers who are already very > >> familiar > >> > > with Pulsar. > >> > > If you don't know it, just want to create a client based on Pulsar's > >> API. > >> > > How to do it? For public API, I think a clear definition should be > the > >> > > first consideration. > >> > > > >> > > Thanks, > >> > > Penghui > >> > > > >> > > > >> > > On Thu, Jun 22, 2023 at 9:13 AM Rajan Dhabalia < > rdhaba...@apache.org> > >> > > wrote: > >> > > > >> > > > Please find the response inline. > >> > > > > >> > > > On Wed, Jun 21, 2023 at 5:53 PM PengHui Li <peng...@apache.org> > >> wrote: > >> > > > > >> > > > > > However, stats retrieval over HTTP API doesn’t work well in > use > >> > cases > >> > > > > when users would like to access this API at a higher scale when > a > >> > large > >> > > > > number of application nodes would like to use it over HTTP which > >> > could > >> > > > > overload brokers and sometimes makes broker irresponsive and > >> impact > >> > > admin > >> > > > > API performance. It’s also become difficult when Pulsar is > >> deployed > >> > in > >> > > > the > >> > > > > cloud behind the SNI proxy and applications also want to access > >> > > > large-scale > >> > > > > stats information periodically over different HTTP port instead > it > >> > > would > >> > > > be > >> > > > > better if applications can fetch stats over on same binary > >> protocol > >> > for > >> > > > > scalability and accessibility reasons. > >> > > > > > >> > > > > From the motivation of the proposal, resolving the performance > >> issue > >> > > with > >> > > > > the REST API > >> > > > > is the goal. Sorry, I haven't run a benchmark for the REST API, > >> have > >> > > you > >> > > > > tested it or could > >> > > > > you please figure out where is the bottleneck with the REST API > >> > > solution? > >> > > > > And if users use > >> > > > > the new approach, what is the performance expectation here, get > a > >> 30% > >> > > or > >> > > > > 50% improvement? > >> > > > > > >> > > > >> Main reasons are performance (you will see improvement due to > >> http > >> > > > overhead compare to tcp, limited threads on http-server side and > >> > > definitely > >> > > > performance compare to netty-tcp vs http-jetty server), user > >> > feasibility > >> > > > (same reason why we have producer/consumer stats using binary > >> protocol > >> > > > where user can get the stats from the same producer/consumer > entity > >> > > without > >> > > > having additional client creation for the stats), scale (allowing > >> large > >> > > > number of requests over binary protocol compare to http is more > >> > > scalable). > >> > > > > >> > > > > >> > > > > And I'm not sure why it is difficult to fetch the stats behind > the > >> > SNI > >> > > > > proxy, could you please explain > >> > > > > more? It will be helpful for the reviewers to understand the > >> issue we > >> > > > want > >> > > > > to resolve. > >> > > > > > >> > > > >> SNI is layer-4 tcp layer protocol and does not support over the > >> > http. > >> > > > > >> > > > > >> > > > > For the client side API changes. You have InternalStatsOption > as a > >> > > param > >> > > > of > >> > > > > the method, but > >> > > > > I don't see any definition for it. It should be enum? And why > not > >> > have > >> > > a > >> > > > > pojo for it? It will be > >> > > > > more easier for users to know what should be set and what should > >> not. > >> > > > > > >> > > > I have added that param map after our last discussion which will > be > >> > enum > >> > > > to pass additional flags. > >> > > > > >> > > > > >> > > > > For the response data. The JSON string is not good for > >> compatibility. > >> > > > And I > >> > > > > saw the discussion > >> > > > > in the DISCUSSION thread from you and Enrico. But it's not a > >> > guaranteed > >> > > > > solution from the API's > >> > > > > perspective. And for any other clients, cpp, go, rust, they must > >> > build > >> > > > > their own pojo based on > >> > > > > the REST API. > >> > > > > > >> > > > Well JSON would be the better option to avoid data conversion and > >> copy > >> > > for > >> > > > complex stats/internal-stats complex data-structures at broker and > >> > client > >> > > > side, maintaining consistency between stats/internal-stats > >> pojo-schema, > >> > > and > >> > > > accessing stats with consistent pojo definition. > >> > > > > >> > > > > >> > > > > Thanks, > >> > > > > Penghui > >> > > > > > >> > > > > On Tue, Jun 20, 2023 at 3:06 PM Rajan Dhabalia < > >> rdhaba...@apache.org > >> > > > >> > > > > wrote: > >> > > > > > >> > > > > > Hi, > >> > > > > > > >> > > > > > I would like to start VOTE for : > >> > > > > > https://github.com/apache/pulsar/issues/20265 > >> > > > > > > >> > > > > > Thanks, > >> > > > > > Rajan > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > >