(ping)
Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli <eolive...@gmail.com> ha scritto: > Sijie, > Thanks for your questions, answers inline below. > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <guosi...@gmail.com> ha > scritto: > >> I would like to see the clarification between the broker protocol handlers >> and proxy protocol handlers before moving it to a vote thread. >> > > A PH in the broker is very useful as it allows you to directly access the > ManagedLedger and implement high performance adapters for > other wire protocols. > The bigger limitation is that you can access efficiently only the topics > owned by the local broker. > If you try to forward/proxy the request to another broker (you can do it, > and this was Matteo's suggestion at the latest Video Community meeting) > you have the downside that the broker has to waste resources to do the > "proxy work" > and you generally want a broker machine to be used only to deal with the > local traffic. > > The load balancing mechanism of the brokers is not meant to deal with > additional work due to proxying requests related to the topics for which > the broker is not owner. > > A PH in the proxy is useful to add new protocols that are running in front > of the whole cluster and not only of one single broker. > This is a very different use case in respect to having the PH in broker. > > The work of the proxy usually is to forward requests to the internal > services of the cluster, and in case of new protocols in the proxy > you need some logic to fill in the gaps in the original wireprotocol. > > System architects expect a different kind of load on the proxy and other > kinds of load on the brokers. > For instance you usually can run very few proxies to cover a big cluster > with many brokers. > So adding a PH on all the brokers is sometimes overkilling. > > >> >> I can see how it will cause confusion for protocol developers. >> > > Protocol developers are very advanced users that do need to understand > clearly the internals of Pulsar. > In fact this request of having PHs in the Proxy layer came from myself and > from other colleagues of mine who are working heavily in implementing > new protocol handlers in Pulsar. > > And we faced the limitation of the need to create a new proxy service for > each new protocol, but all of these "proxy services" have in common > most of the features of the Pulsar proxy. > When we also came to deal with System Architects it was clear the > requirement to have only one single "place" to put all of the interactions > at "cluster level" with Pulsar. > > I think this is a good picture of what I mean: > - PH in the Broker -> add protocols inside the Broker, work for owned > topics > - PH in the Proxy -> add protocols in front of the whole Cluster > > >> Yunze brought a good idea on KoP. > > > I also have good ideas and working solutions for a Pulsar-proxy like KOP > Proxy. > I will be happy to discuss this in a separate thread or at a separate > table with Yunze. > > A smart KOP proxy can work if you run inside the Pulsar proxy process or > you can copy/paste the Pulsar Proxy code and create another service. > > >> But I don't think that's the right >> direction. If you can give an example of the usage of a proxy handler and >> how it is different from using a broker handler, that would help me >> understand this PIP. >> > > For some protocols you have to execute some non trivial work for mapping > the wireprotocol and the concepts of the protocol to the Pulsar model. > For instance some protocols do not have the concept of "lookup", and the > proxy does the lookup and forwards the request to the internal broker. > > For some protocols you can just use the PulsarClient to connect to the > internal brokers, you do not need and you do not want to access the > ManagedLedgers: > in this case adding the execution inside the broker is only complicating > the overall design of the system and putting load on the brokers. > > There is a good amount of processing that should be executed on the proxy, > and it is not good to run it on a broker. > If you do not put the "custom code" in the Proxy and you can only write a > Broker PH you end up in adding it to the Broker. > > If you expose directly (with some LoadBalancer or whatever) your brokers > in which you run the PH code that you would put in the proxy > you end up in putting on the broker some load that is not expected: > - the broker will have to work even for topics for which it is not the > owner > - the broker will have to do things that cannot be dealt correctly by the > Pulsar load balancer (because it expects that the load it proportional to > the owned bundles) > > >> >> The reason why Pulsar proxy is built is to have a "smart" proxy that is >> aware of Pulsar protocol. The Pulsar proxy can be replaced with other >> mature proxy software with SNI routing or multiple advertised listeners >> now. Hence I am afraid that we are taking the wrong direction here. Here >> are various reasons. >> >> 1) The ProxyService is essentially a Pulsar admin client. Broker service >> also provides a Pulsar admin client. I am not sure how Proxy PH will >> simplify the protocol handler development. Please use an example to >> demonstrate it. >> > > In the cases I am highlighting, *the Broker is simply not the right place > to run the code*. > > So the problem here is not to have PulsarAdmin in the Broker on in the > Proxy. > Is that if you want to write a smart proxy for another protocol: > - you end up in copy/pasting the Proxy code > - you use the internal Pulsar classes to have a consistent behaviour with > the Pulsar Proxy > - you add more components to the "picture" of the Pulsar cluster > > >> 2) The Authorization & Authentication services in ProxyService are only >> used when proxies are configured to use zookeeper for broker discovery. >> However, this option is not recommended when running Pulsar proxies in >> Kubernetes. Instead, using a broker discovery service is recommended. In >> order to make PH work, you are forcing proxy to be tight with the >> zookeeper. >> > > This is not needed for all of the Proxy PH handlers. > But Authorization & Authentication are a core part of this story. > If you implement your "smart proxy" somewhere else and not as a Plugin to > the Pulsar Proxy (or Broker) > you cannot leverage the same services, the same way. > It leads to having more chances of having a behaviour different from > standard Pulsar. > > PH developers are Pulsar experts, and you know that copy pasting code from > Pulsar, leads to unpredictable behaviour > when you run your plugin in another version of Pulsar. > But if you use an API that is going to be maintained by Pulsar you are > safer and you can think that your code is going to work. > > >> >> 3) Configuring authentication and authorization in proxy is already >> challenging. There are a few different combinations. A typical Pulsar >> setup >> is to forward the authentication credentials to the brokers to >> authenticate >> and authorize. If you don't do this correctly, it will introduce security >> holes because a connection can potentially grab the superuser credential >> configured in proxy and use superuser credentials to access brokers. From >> this perspective, I think proxy protocol handler doesn't make things >> simpler instead it makes things complicated when it comes to >> authentication >> and authorization. >> > > Yes, this is a very complex problem indeed. > > We can help developers by providing a standard framework to access these > services. > > It is very important from my point of view, that we do not encourage > developers to create > their own versions of a Pulsar proxy. > > My recent experience is that we can add many new wire protocols to Pulsar > and this will help a lot with the adoption of Pulsar. > > As we are doing in many other places on Pulsar we should provide tools to > write extensions > and do not let people be too creative. > > >> >> I would like to see these questions are answered before moving to a vote. >> > > I hope that we can reach consensus on the need of this API. > because I see that there is a real need for making this happen. > > It is the Pulsar momentum now, there are so many opportunities to reach > out to users of other systems, > let's not waste these opportunities. > > > Enrico > > > >> >> - Sijie >> >> >> >> >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > Any other comment? >> > >> > I would like to start a VOTE, but I feel we saw too few comments here >> > >> > Please take a look. >> > I believe it will be a good fit for 2.9.0 release, that is going to be >> > released in the end of September >> > >> > >> > Enrico >> > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <mikemars...@gmail.com> ha >> > scritto: >> > >> > > +1, just read through the PIP. Looks good to me. >> > > >> > > - Michael >> > > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <eolive...@gmail.com> >> > > wrote: >> > > >> > > > Hello Pulsar fellows, >> > > > >> > > > I have prepared a PIP about adding support for Protocol Handlers >> > > > >> > > > This is the GDoc >> > > > >> > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing >> > > > >> > > > >> > > > This is the PR for the implementation >> > > > https://github.com/apache/pulsar/pull/11838/files >> > > > >> > > > I am pretty sure that this PIP will make life of developers of >> Protocol >> > > > Handlers and of Administrators who deploy Protocol Handlers very >> nicer >> > > > >> > > > We are still working on the formal PIP process, at the moment I am >> > > sharing >> > > > with you the document. >> > > > My understanding is that after the discussion, I will start a VOTE >> > > thread, >> > > > and if the VOTE passes we can move forward with reviewing the PR, >> and >> > > > hopefully merge this feature for Pulsar 2.9.0 >> > > > >> > > > Enrico >> > > > >> > > >> > >> >