Hello everyone, I have created a new version of the old PIP-93 ("Proxy Protocol Handlers), now it is "PIP-95 Pulsar Proxy Extensions".
The name "Protocol Handlers" was too confusing, as the kind of extensions I want to build are very different from Broker Protocol Handlers. The idea behind PIP-95 is very simple: 1. You can add "extensions" to the Proxy service 2. Such extensions live in the Proxy service, use conf/proxy.conf, "bin/pulsar proxy" 3. They work out-of-the box with the Helm Chart, no need to add new services/deployment/pods..... 4. An extension can access Pulsar Authentication, Authorization and BrokerDiscovery services This is the PIP-95 https://github.com/apache/pulsar/issues/12157 This is the PR for the implementation https://github.com/apache/pulsar/pull/11838 I hope that this helps to better understand the use cases I have presented during the discussion and that this allows the community to reach a consensus in adopting this feature. I would be nice to port the "Websocket proxy" to being a "proxy extension" one day, but this is a separate discussion, not part of PIP-95 Best regards Enrico Il giorno ven 17 set 2021 alle ore 08:18 Sijie Guo <guosi...@gmail.com> ha scritto: > > I totally understand this point. I wasn't there when the proxy was born > but > currently > my experience is that the Proxy is perceived as the primary endpoint in > front of the Pulsar cluster > especially when you run in k8s. > > The Pulsar Proxy was born because there is no great solution at that point. > However, the Kubernetes stack has evolved beyond what it was before. So > does Pulsar evolve. > > For example, > > https://github.com/apache/pulsar/wiki/PIP-60%3A-Support-Proxy-server-with-SNI-routing > is introduced to use other mature proxy softwares with SNI routing. > > Multiple broker listeners have been introduced to allow better integrations > with proxy and service mesh solutions. Hence I don't think "proxy" is the > primary endpoint in front of a Pulsar cluster anymore. > > Hence I don't think proxy PH is the right solution for the problems you are > trying to solve. I would avoid introducing PH to proxy. > > - Sijie > > On Tue, Sep 14, 2021 at 8:02 AM Enrico Olivelli <eolive...@gmail.com> > wrote: > > > other comments ? > > > > Enrico > > > > Il giorno gio 9 set 2021 alle ore 09:15 Enrico Olivelli < > > eolive...@gmail.com> > > ha scritto: > > > > > Joe, > > > > > > Il giorno gio 9 set 2021 alle ore 04:31 Joe F <joefranc...@gmail.com> > ha > > > scritto: > > > > > >> Enrico, my initial comment when you brought up PH was in relation to > > the > > >> larger question about proxying, rather than looking at this in a > limited > > >> fashion on how to make it easy to add new PH in the proxy. > > >> > > >> But specifically with this, here are my comments. Two very > > >> distinct abstractions are being mixed up here, and I'm not sure > > >> whether that is a good idea or not. > > >> > > > > > > One way of seeing this PIP is to simply complete the work initiated > with > > > PIP-41 (Introduction of Broker PHs, > > > > > > https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler > > > ). > > > > > > > > > > > >> > > >> The proxy was designed to move bits and bytes without interpretation, > > >> from > > >> one network to the another. The issue with Pulsar is that it > requires > > >> some interpretation of the data to find to which server a client > should > > >> connect. . Protocol translation crept into the proxy, just to be able > > to > > >> ask this question. Since auth is required to answer this question, > auth > > >> also crept in. Essentially the proxy was built as a TCP proxy, not > > as a > > >> wire protocol translator. Some additional hacky things needed to be > > done > > >> to make it work as a TCP proxy, and in my opinion those things > should > > >> die away to the fullest extent possible > > >> > > > > > > I totally understand this point. I wasn't there when the proxy was born > > > but currently > > > my experience is that the Proxy is perceived as the primary endpoint in > > > front of the Pulsar cluster > > > especially when you run in k8s. > > > > > > > > > > > >> > > >> Because of all this, the current implementation is not ideal. It's > > usage > > >> is highly restricted in actual deployments, because of potential > > security > > >> risks if the proxy is misconfigured. One needs to be strict about > > setting > > >> up the proxy to meet security standards in highly regulated > > environments. > > >> > > >> > > >> > > >> >And we faced the limitation of the need to create a new proxy service > > for > > >> >each new protocol, but all of these "proxy services" have in common > > >> >most of the features of the Pulsar proxy. > > >> >When we also came to deal with System Architects it was clear the > > >> >requirement to have only one single "place" to put all of the > > >> interactions > > >> >at "cluster level" with Pulsar. > > >> > > >> Good idea, a single place seems right. Can the proxy answer the > traffic > > >> routing question without interpreting the data? Essentially, move what > > is > > >> done within the proxy now, to a well known service within the > cluster, > > >> and > > >> use that ? > > >> > > > > > > In the usecases I know, simply routing PDUs to internal brokers is not > > > enough > > > but you often need to add complex mapping logic from the External > > Protocol > > > Concepts to Pulsar concepts on the Proxy component. > > > > > > So you have two ways: > > > 1. create your own service and deploy it separately: this was the > > > beginning of my work and the same did some colleagues of mine > > > 2. deploy your code inside the Pulsar Proxy, and leverage current > > > packaging, configuration, tools, security APIs, helm chart..... > > > > > > I started this discussion because I found option 1 very awkward for > Proxy > > > Component developers, for System Administrators and for System > > Architects. > > > > > > Developers: > > > - you have to copy/paste some Pulsar Proxy code, import Proxy jars, use > > > internal Pulsar classes to implement Authentication, Authorization, > > Service > > > Discovery., Configuration... > > > > > > System Administrators: > > > - you have a new set of configuration files and tools to manage the > > > settings (and in k8s you have to modify the Helm Chart significantly) > > > > > > System Architects: > > > - you have multiple new components in the pictures, to explain, to > > > justify..... > > > > > > With this proposal: > > > > > > Developers: > > > - use a framework, do not reinvent the wheel, be able to ensure that > you > > > are compatible with a give Pulsar version, ensure that the behaviour is > > > consistent with other Pulsar components (like using ProxyConfiguration, > > or > > > the same service lifecycle, same libs) you can evolve more easily > > > > > > System Administrator: > > > - you use proxy.conf/broker.conf, you use Pulsar CLI tools, no need to > > > change the Helm Charts > > > > > > System Architects: > > > - nothing new in the table, every Pulsar docs applies, you have the > Proxy > > > that deals with external clients, but it is able to speak Pulsar, > Kafka, > > > RabbitMQ, MQTT, ActiveMQ > > > > > > > > > > > > > > >> > > >> >I think this is a good picture of what I mean: > > >> >- PH in the Broker -> add protocols inside the Broker, work for owned > > >> topics > > >> >- PH in the Proxy -> add protocols in front of the whole Cluster > > >> >There is a good amount of processing that should be executed on the > > >> proxy, > > >> >and it is not good to run it on a broker. > > >> > > >> Is a TCP proxy a good place to do wire protocol translation > > >> (computation)? > > >> Especially if that translation is a good amount of processing? if > it's > > >> not > > >> good to run this much processing on the broker, then it's even worse > to > > >> run > > >> it on a network proxy. I can foresee this as a path that will lead to > > >> cluster and load management creeping into the proxy, as soon as you > move > > >> beyond what a single proxy can handle. > > >> > > >> But I think these issues (of n/w vs protocol translation) are moot > when > > >> you > > >> look at the larger needs of generic proxy that will support ingress, > > >> configurable protocol handlers, load balancing etc for use with > Pulsar. > > >> You > > >> can run a bunch of Pulsar's proxies today, and there is no means to > > >> manage > > >> them properly. eg: load balance between them/ manage them as a > cluster/ > > >> have affinity of proxies to topics/tenants. etc. This applies even > > before > > >> this PIP (and more so once you add more processing into the proxy). > > >> > > >> The Pulsar proxy, as it is, is not amenable to creating anything > like a > > >> service mesh. It would demand a lot of work in the proxy. Hence my > > >> initial comment about the proxy eventually becoming a mudball, and why > > we > > >> should rethink this entire proxy. > > >> > > >> It is tempting to evolve the Pulsar proxy into a service that > supports > > >> everything.. ingress, transformation chains, cluster management etc . > > >> This will eventually end up duplicating something which already > exists > > >> elsewhere. My take is that this is better done by building on top of > > >> something like envoy ( or similar) which has built in and mature > > >> features, > > >> and supported by a wide user base. > > >> > > > > > > Unfortunately general purpose proxies or proxies specific to some > > protocol > > > will not be able to > > > do efficiently what we can do using Pulsar APIs, because they cannot > > "map" > > > directly External Concepts to the Pulsar model. > > > > > > I cannot imagine the cost of developing and maintaining a plugin for > > Envoy > > > that is able to deal > > > with Pulsar concepts. For instance it is not written in Java and you > > > cannot use Java Bindings for Pulsar, that are feature complete and > always > > > up-to-date with latest features. > > > Also developers that work on PHs are specialized in Pulsar code and in > > > Java (at very high levels), and so for them it is harder to write super > > > efficient and high quality plugins using non-Java languages. > > > > > > So I see a huge value in adding this ability to the Pulsar Proxy. > > > > > > The only alternative to this PIP is to create a new framework for > > creating > > > such "Smart Proxies" in Java and using some official/maintained Pulsar > > API. > > > > > > So we will end up discussing the value of adding such a brand new > module, > > > and how to deploy/manage it. > > > > > > It is a huge cost and it will take so much time: > > > - design, > > > - adding new concepts to the architecture, > > > - adding a new service (new management tools), > > > - lot of new code (probably cut/paste from Pulsar Proxy) > > > - helm chart > > > - new configuration files > > > - docs > > > > > > I believe that we should spend our time in adding more > bindings/protocol > > > handlers instead of doing that. > > > > > > By the way I will be happy to drive this new effort if this is REALLY > > what > > > we want. > > > > > > So I am convinced that for the short/mid term this PIP is the best > choice > > > to help Pulsar adoption. > > > > > > This PIP will unlock some great potential that otherwise will be > > > available only to users of custom tools, not officially maintained > > > inside the Pulsar project. > > > I will be very sad about the outcome > > > > > > > > > > > > Enrico > > > > > > > > > > > >> > > >> -j > > >> > > >> On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli <eolive...@gmail.com> > > >> wrote: > > >> > > >> > (ping) > > >> > > > >> > > > >> > Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli < > > >> > eolive...@gmail.com> > > >> > ha scritto: > > >> > > > >> > > Sijie, > > >> > > Thanks for your questions, answers inline below. > > >> > > > > >> > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo < > > guosi...@gmail.com > > >> > > > >> > ha > > >> > > scritto: > > >> > > > > >> > >> I would like to see the clarification between the broker protocol > > >> > handlers > > >> > >> and proxy protocol handlers before moving it to a vote thread. > > >> > >> > > >> > > > > >> > > A PH in the broker is very useful as it allows you to directly > > access > > >> the > > >> > > ManagedLedger and implement high performance adapters for > > >> > > other wire protocols. > > >> > > The bigger limitation is that you can access efficiently only the > > >> topics > > >> > > owned by the local broker. > > >> > > If you try to forward/proxy the request to another broker (you can > > do > > >> it, > > >> > > and this was Matteo's suggestion at the latest Video Community > > >> meeting) > > >> > > you have the downside that the broker has to waste resources to do > > the > > >> > > "proxy work" > > >> > > and you generally want a broker machine to be used only to deal > with > > >> the > > >> > > local traffic. > > >> > > > > >> > > The load balancing mechanism of the brokers is not meant to deal > > with > > >> > > additional work due to proxying requests related to the topics for > > >> which > > >> > > the broker is not owner. > > >> > > > > >> > > A PH in the proxy is useful to add new protocols that are running > in > > >> > front > > >> > > of the whole cluster and not only of one single broker. > > >> > > This is a very different use case in respect to having the PH in > > >> broker. > > >> > > > > >> > > The work of the proxy usually is to forward requests to the > internal > > >> > > services of the cluster, and in case of new protocols in the proxy > > >> > > you need some logic to fill in the gaps in the original > > wireprotocol. > > >> > > > > >> > > System architects expect a different kind of load on the proxy and > > >> other > > >> > > kinds of load on the brokers. > > >> > > For instance you usually can run very few proxies to cover a big > > >> cluster > > >> > > with many brokers. > > >> > > So adding a PH on all the brokers is sometimes overkilling. > > >> > > > > >> > > > > >> > >> > > >> > >> I can see how it will cause confusion for protocol developers. > > >> > >> > > >> > > > > >> > > Protocol developers are very advanced users that do need to > > understand > > >> > > clearly the internals of Pulsar. > > >> > > In fact this request of having PHs in the Proxy layer came from > > myself > > >> > and > > >> > > from other colleagues of mine who are working heavily in > > implementing > > >> > > new protocol handlers in Pulsar. > > >> > > > > >> > > And we faced the limitation of the need to create a new proxy > > service > > >> for > > >> > > each new protocol, but all of these "proxy services" have in > common > > >> > > most of the features of the Pulsar proxy. > > >> > > When we also came to deal with System Architects it was clear the > > >> > > requirement to have only one single "place" to put all of the > > >> > interactions > > >> > > at "cluster level" with Pulsar. > > >> > > > > >> > > I think this is a good picture of what I mean: > > >> > > - PH in the Broker -> add protocols inside the Broker, work for > > owned > > >> > > topics > > >> > > - PH in the Proxy -> add protocols in front of the whole Cluster > > >> > > > > >> > > > > >> > >> Yunze brought a good idea on KoP. > > >> > > > > >> > > > > >> > > I also have good ideas and working solutions for a Pulsar-proxy > like > > >> KOP > > >> > > Proxy. > > >> > > I will be happy to discuss this in a separate thread or at a > > separate > > >> > > table with Yunze. > > >> > > > > >> > > A smart KOP proxy can work if you run inside the Pulsar proxy > > process > > >> or > > >> > > you can copy/paste the Pulsar Proxy code and create another > service. > > >> > > > > >> > > > > >> > >> But I don't think that's the right > > >> > >> direction. If you can give an example of the usage of a proxy > > handler > > >> > and > > >> > >> how it is different from using a broker handler, that would help > me > > >> > >> understand this PIP. > > >> > >> > > >> > > > > >> > > For some protocols you have to execute some non trivial work for > > >> mapping > > >> > > the wireprotocol and the concepts of the protocol to the Pulsar > > model. > > >> > > For instance some protocols do not have the concept of "lookup", > and > > >> the > > >> > > proxy does the lookup and forwards the request to the internal > > broker. > > >> > > > > >> > > For some protocols you can just use the PulsarClient to connect to > > the > > >> > > internal brokers, you do not need and you do not want to access > the > > >> > > ManagedLedgers: > > >> > > in this case adding the execution inside the broker is only > > >> complicating > > >> > > the overall design of the system and putting load on the brokers. > > >> > > > > >> > > There is a good amount of processing that should be executed on > the > > >> > proxy, > > >> > > and it is not good to run it on a broker. > > >> > > If you do not put the "custom code" in the Proxy and you can only > > >> write a > > >> > > Broker PH you end up in adding it to the Broker. > > >> > > > > >> > > If you expose directly (with some LoadBalancer or whatever) your > > >> brokers > > >> > > in which you run the PH code that you would put in the proxy > > >> > > you end up in putting on the broker some load that is not > expected: > > >> > > - the broker will have to work even for topics for which it is not > > the > > >> > > owner > > >> > > - the broker will have to do things that cannot be dealt correctly > > by > > >> the > > >> > > Pulsar load balancer (because it expects that the load it > > >> proportional to > > >> > > the owned bundles) > > >> > > > > >> > > > > >> > >> > > >> > >> The reason why Pulsar proxy is built is to have a "smart" proxy > > that > > >> is > > >> > >> aware of Pulsar protocol. The Pulsar proxy can be replaced with > > other > > >> > >> mature proxy software with SNI routing or multiple advertised > > >> listeners > > >> > >> now. Hence I am afraid that we are taking the wrong direction > here. > > >> Here > > >> > >> are various reasons. > > >> > >> > > >> > >> 1) The ProxyService is essentially a Pulsar admin client. Broker > > >> service > > >> > >> also provides a Pulsar admin client. I am not sure how Proxy PH > > will > > >> > >> simplify the protocol handler development. Please use an example > to > > >> > >> demonstrate it. > > >> > >> > > >> > > > > >> > > In the cases I am highlighting, *the Broker is simply not the > right > > >> place > > >> > > to run the code*. > > >> > > > > >> > > So the problem here is not to have PulsarAdmin in the Broker on in > > the > > >> > > Proxy. > > >> > > Is that if you want to write a smart proxy for another protocol: > > >> > > - you end up in copy/pasting the Proxy code > > >> > > - you use the internal Pulsar classes to have a consistent > behaviour > > >> with > > >> > > the Pulsar Proxy > > >> > > - you add more components to the "picture" of the Pulsar cluster > > >> > > > > >> > > > > >> > >> 2) The Authorization & Authentication services in ProxyService > are > > >> only > > >> > >> used when proxies are configured to use zookeeper for broker > > >> discovery. > > >> > >> However, this option is not recommended when running Pulsar > proxies > > >> in > > >> > >> Kubernetes. Instead, using a broker discovery service is > > >> recommended. In > > >> > >> order to make PH work, you are forcing proxy to be tight with the > > >> > >> zookeeper. > > >> > >> > > >> > > > > >> > > This is not needed for all of the Proxy PH handlers. > > >> > > But Authorization & Authentication are a core part of this story. > > >> > > If you implement your "smart proxy" somewhere else and not as a > > >> Plugin to > > >> > > the Pulsar Proxy (or Broker) > > >> > > you cannot leverage the same services, the same way. > > >> > > It leads to having more chances of having a behaviour different > from > > >> > > standard Pulsar. > > >> > > > > >> > > PH developers are Pulsar experts, and you know that copy pasting > > code > > >> > from > > >> > > Pulsar, leads to unpredictable behaviour > > >> > > when you run your plugin in another version of Pulsar. > > >> > > But if you use an API that is going to be maintained by Pulsar you > > are > > >> > > safer and you can think that your code is going to work. > > >> > > > > >> > > > > >> > >> > > >> > >> 3) Configuring authentication and authorization in proxy is > already > > >> > >> challenging. There are a few different combinations. A typical > > Pulsar > > >> > >> setup > > >> > >> is to forward the authentication credentials to the brokers to > > >> > >> authenticate > > >> > >> and authorize. If you don't do this correctly, it will introduce > > >> > security > > >> > >> holes because a connection can potentially grab the superuser > > >> credential > > >> > >> configured in proxy and use superuser credentials to access > > brokers. > > >> > From > > >> > >> this perspective, I think proxy protocol handler doesn't make > > things > > >> > >> simpler instead it makes things complicated when it comes to > > >> > >> authentication > > >> > >> and authorization. > > >> > >> > > >> > > > > >> > > Yes, this is a very complex problem indeed. > > >> > > > > >> > > We can help developers by providing a standard framework to access > > >> these > > >> > > services. > > >> > > > > >> > > It is very important from my point of view, that we do not > encourage > > >> > > developers to create > > >> > > their own versions of a Pulsar proxy. > > >> > > > > >> > > My recent experience is that we can add many new wire protocols to > > >> Pulsar > > >> > > and this will help a lot with the adoption of Pulsar. > > >> > > > > >> > > As we are doing in many other places on Pulsar we should provide > > >> tools to > > >> > > write extensions > > >> > > and do not let people be too creative. > > >> > > > > >> > > > > >> > >> > > >> > >> I would like to see these questions are answered before moving > to a > > >> > vote. > > >> > >> > > >> > > > > >> > > I hope that we can reach consensus on the need of this API. > > >> > > because I see that there is a real need for making this happen. > > >> > > > > >> > > It is the Pulsar momentum now, there are so many opportunities to > > >> reach > > >> > > out to users of other systems, > > >> > > let's not waste these opportunities. > > >> > > > > >> > > > > >> > > Enrico > > >> > > > > >> > > > > >> > > > > >> > >> > > >> > >> - Sijie > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli < > > eolive...@gmail.com > > >> > > > >> > >> wrote: > > >> > >> > > >> > >> > Any other comment? > > >> > >> > > > >> > >> > I would like to start a VOTE, but I feel we saw too few > comments > > >> here > > >> > >> > > > >> > >> > Please take a look. > > >> > >> > I believe it will be a good fit for 2.9.0 release, that is > going > > >> to be > > >> > >> > released in the end of September > > >> > >> > > > >> > >> > > > >> > >> > Enrico > > >> > >> > > > >> > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall < > > mikemars...@gmail.com> > > >> ha > > >> > >> > scritto: > > >> > >> > > > >> > >> > > +1, just read through the PIP. Looks good to me. > > >> > >> > > > > >> > >> > > - Michael > > >> > >> > > > > >> > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli < > > >> > eolive...@gmail.com> > > >> > >> > > wrote: > > >> > >> > > > > >> > >> > > > Hello Pulsar fellows, > > >> > >> > > > > > >> > >> > > > I have prepared a PIP about adding support for Protocol > > >> Handlers > > >> > >> > > > > > >> > >> > > > This is the GDoc > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > >> > > > https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing > > >> > >> > > > > > >> > >> > > > > > >> > >> > > > This is the PR for the implementation > > >> > >> > > > https://github.com/apache/pulsar/pull/11838/files > > >> > >> > > > > > >> > >> > > > I am pretty sure that this PIP will make life of developers > > of > > >> > >> Protocol > > >> > >> > > > Handlers and of Administrators who deploy Protocol Handlers > > >> very > > >> > >> nicer > > >> > >> > > > > > >> > >> > > > We are still working on the formal PIP process, at the > moment > > >> I am > > >> > >> > > sharing > > >> > >> > > > with you the document. > > >> > >> > > > My understanding is that after the discussion, I will > start a > > >> VOTE > > >> > >> > > thread, > > >> > >> > > > and if the VOTE passes we can move forward with reviewing > the > > >> PR, > > >> > >> and > > >> > >> > > > hopefully merge this feature for Pulsar 2.9.0 > > >> > >> > > > > > >> > >> > > > Enrico > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > > > > >> > > > >> > > > > > >