Joe, Il giorno gio 9 set 2021 alle ore 04:31 Joe F <joefranc...@gmail.com> ha scritto:
> Enrico, my initial comment when you brought up PH was in relation to the > larger question about proxying, rather than looking at this in a limited > fashion on how to make it easy to add new PH in the proxy. > > But specifically with this, here are my comments. Two very > distinct abstractions are being mixed up here, and I'm not sure > whether that is a good idea or not. > One way of seeing this PIP is to simply complete the work initiated with PIP-41 (Introduction of Broker PHs, https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler). > > The proxy was designed to move bits and bytes without interpretation, from > one network to the another. The issue with Pulsar is that it requires > some interpretation of the data to find to which server a client should > connect. . Protocol translation crept into the proxy, just to be able to > ask this question. Since auth is required to answer this question, auth > also crept in. Essentially the proxy was built as a TCP proxy, not as a > wire protocol translator. Some additional hacky things needed to be done > to make it work as a TCP proxy, and in my opinion those things should > die away to the fullest extent possible > I totally understand this point. I wasn't there when the proxy was born but currently my experience is that the Proxy is perceived as the primary endpoint in front of the Pulsar cluster especially when you run in k8s. > > Because of all this, the current implementation is not ideal. It's usage > is highly restricted in actual deployments, because of potential security > risks if the proxy is misconfigured. One needs to be strict about setting > up the proxy to meet security standards in highly regulated environments. > > > > >And we faced the limitation of the need to create a new proxy service for > >each new protocol, but all of these "proxy services" have in common > >most of the features of the Pulsar proxy. > >When we also came to deal with System Architects it was clear the > >requirement to have only one single "place" to put all of the interactions > >at "cluster level" with Pulsar. > > Good idea, a single place seems right. Can the proxy answer the traffic > routing question without interpreting the data? Essentially, move what is > done within the proxy now, to a well known service within the cluster, and > use that ? > In the usecases I know, simply routing PDUs to internal brokers is not enough but you often need to add complex mapping logic from the External Protocol Concepts to Pulsar concepts on the Proxy component. So you have two ways: 1. create your own service and deploy it separately: this was the beginning of my work and the same did some colleagues of mine 2. deploy your code inside the Pulsar Proxy, and leverage current packaging, configuration, tools, security APIs, helm chart..... I started this discussion because I found option 1 very awkward for Proxy Component developers, for System Administrators and for System Architects. Developers: - you have to copy/paste some Pulsar Proxy code, import Proxy jars, use internal Pulsar classes to implement Authentication, Authorization, Service Discovery., Configuration... System Administrators: - you have a new set of configuration files and tools to manage the settings (and in k8s you have to modify the Helm Chart significantly) System Architects: - you have multiple new components in the pictures, to explain, to justify..... With this proposal: Developers: - use a framework, do not reinvent the wheel, be able to ensure that you are compatible with a give Pulsar version, ensure that the behaviour is consistent with other Pulsar components (like using ProxyConfiguration, or the same service lifecycle, same libs) you can evolve more easily System Administrator: - you use proxy.conf/broker.conf, you use Pulsar CLI tools, no need to change the Helm Charts System Architects: - nothing new in the table, every Pulsar docs applies, you have the Proxy that deals with external clients, but it is able to speak Pulsar, Kafka, RabbitMQ, MQTT, ActiveMQ > > >I think this is a good picture of what I mean: > >- PH in the Broker -> add protocols inside the Broker, work for owned > topics > >- PH in the Proxy -> add protocols in front of the whole Cluster > >There is a good amount of processing that should be executed on the proxy, > >and it is not good to run it on a broker. > > Is a TCP proxy a good place to do wire protocol translation (computation)? > Especially if that translation is a good amount of processing? if it's not > good to run this much processing on the broker, then it's even worse to run > it on a network proxy. I can foresee this as a path that will lead to > cluster and load management creeping into the proxy, as soon as you move > beyond what a single proxy can handle. > > But I think these issues (of n/w vs protocol translation) are moot when you > look at the larger needs of generic proxy that will support ingress, > configurable protocol handlers, load balancing etc for use with Pulsar. You > can run a bunch of Pulsar's proxies today, and there is no means to manage > them properly. eg: load balance between them/ manage them as a cluster/ > have affinity of proxies to topics/tenants. etc. This applies even before > this PIP (and more so once you add more processing into the proxy). > > The Pulsar proxy, as it is, is not amenable to creating anything like a > service mesh. It would demand a lot of work in the proxy. Hence my > initial comment about the proxy eventually becoming a mudball, and why we > should rethink this entire proxy. > > It is tempting to evolve the Pulsar proxy into a service that supports > everything.. ingress, transformation chains, cluster management etc . > This will eventually end up duplicating something which already exists > elsewhere. My take is that this is better done by building on top of > something like envoy ( or similar) which has built in and mature features, > and supported by a wide user base. > Unfortunately general purpose proxies or proxies specific to some protocol will not be able to do efficiently what we can do using Pulsar APIs, because they cannot "map" directly External Concepts to the Pulsar model. I cannot imagine the cost of developing and maintaining a plugin for Envoy that is able to deal with Pulsar concepts. For instance it is not written in Java and you cannot use Java Bindings for Pulsar, that are feature complete and always up-to-date with latest features. Also developers that work on PHs are specialized in Pulsar code and in Java (at very high levels), and so for them it is harder to write super efficient and high quality plugins using non-Java languages. So I see a huge value in adding this ability to the Pulsar Proxy. The only alternative to this PIP is to create a new framework for creating such "Smart Proxies" in Java and using some official/maintained Pulsar API. So we will end up discussing the value of adding such a brand new module, and how to deploy/manage it. It is a huge cost and it will take so much time: - design, - adding new concepts to the architecture, - adding a new service (new management tools), - lot of new code (probably cut/paste from Pulsar Proxy) - helm chart - new configuration files - docs I believe that we should spend our time in adding more bindings/protocol handlers instead of doing that. By the way I will be happy to drive this new effort if this is REALLY what we want. So I am convinced that for the short/mid term this PIP is the best choice to help Pulsar adoption. This PIP will unlock some great potential that otherwise will be available only to users of custom tools, not officially maintained inside the Pulsar project. I will be very sad about the outcome Enrico > > -j > > On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli <eolive...@gmail.com> > wrote: > > > (ping) > > > > > > Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli < > > eolive...@gmail.com> > > ha scritto: > > > > > Sijie, > > > Thanks for your questions, answers inline below. > > > > > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <guosi...@gmail.com> > > ha > > > scritto: > > > > > >> I would like to see the clarification between the broker protocol > > handlers > > >> and proxy protocol handlers before moving it to a vote thread. > > >> > > > > > > A PH in the broker is very useful as it allows you to directly access > the > > > ManagedLedger and implement high performance adapters for > > > other wire protocols. > > > The bigger limitation is that you can access efficiently only the > topics > > > owned by the local broker. > > > If you try to forward/proxy the request to another broker (you can do > it, > > > and this was Matteo's suggestion at the latest Video Community meeting) > > > you have the downside that the broker has to waste resources to do the > > > "proxy work" > > > and you generally want a broker machine to be used only to deal with > the > > > local traffic. > > > > > > The load balancing mechanism of the brokers is not meant to deal with > > > additional work due to proxying requests related to the topics for > which > > > the broker is not owner. > > > > > > A PH in the proxy is useful to add new protocols that are running in > > front > > > of the whole cluster and not only of one single broker. > > > This is a very different use case in respect to having the PH in > broker. > > > > > > The work of the proxy usually is to forward requests to the internal > > > services of the cluster, and in case of new protocols in the proxy > > > you need some logic to fill in the gaps in the original wireprotocol. > > > > > > System architects expect a different kind of load on the proxy and > other > > > kinds of load on the brokers. > > > For instance you usually can run very few proxies to cover a big > cluster > > > with many brokers. > > > So adding a PH on all the brokers is sometimes overkilling. > > > > > > > > >> > > >> I can see how it will cause confusion for protocol developers. > > >> > > > > > > Protocol developers are very advanced users that do need to understand > > > clearly the internals of Pulsar. > > > In fact this request of having PHs in the Proxy layer came from myself > > and > > > from other colleagues of mine who are working heavily in implementing > > > new protocol handlers in Pulsar. > > > > > > And we faced the limitation of the need to create a new proxy service > for > > > each new protocol, but all of these "proxy services" have in common > > > most of the features of the Pulsar proxy. > > > When we also came to deal with System Architects it was clear the > > > requirement to have only one single "place" to put all of the > > interactions > > > at "cluster level" with Pulsar. > > > > > > I think this is a good picture of what I mean: > > > - PH in the Broker -> add protocols inside the Broker, work for owned > > > topics > > > - PH in the Proxy -> add protocols in front of the whole Cluster > > > > > > > > >> Yunze brought a good idea on KoP. > > > > > > > > > I also have good ideas and working solutions for a Pulsar-proxy like > KOP > > > Proxy. > > > I will be happy to discuss this in a separate thread or at a separate > > > table with Yunze. > > > > > > A smart KOP proxy can work if you run inside the Pulsar proxy process > or > > > you can copy/paste the Pulsar Proxy code and create another service. > > > > > > > > >> But I don't think that's the right > > >> direction. If you can give an example of the usage of a proxy handler > > and > > >> how it is different from using a broker handler, that would help me > > >> understand this PIP. > > >> > > > > > > For some protocols you have to execute some non trivial work for > mapping > > > the wireprotocol and the concepts of the protocol to the Pulsar model. > > > For instance some protocols do not have the concept of "lookup", and > the > > > proxy does the lookup and forwards the request to the internal broker. > > > > > > For some protocols you can just use the PulsarClient to connect to the > > > internal brokers, you do not need and you do not want to access the > > > ManagedLedgers: > > > in this case adding the execution inside the broker is only > complicating > > > the overall design of the system and putting load on the brokers. > > > > > > There is a good amount of processing that should be executed on the > > proxy, > > > and it is not good to run it on a broker. > > > If you do not put the "custom code" in the Proxy and you can only > write a > > > Broker PH you end up in adding it to the Broker. > > > > > > If you expose directly (with some LoadBalancer or whatever) your > brokers > > > in which you run the PH code that you would put in the proxy > > > you end up in putting on the broker some load that is not expected: > > > - the broker will have to work even for topics for which it is not the > > > owner > > > - the broker will have to do things that cannot be dealt correctly by > the > > > Pulsar load balancer (because it expects that the load it proportional > to > > > the owned bundles) > > > > > > > > >> > > >> The reason why Pulsar proxy is built is to have a "smart" proxy that > is > > >> aware of Pulsar protocol. The Pulsar proxy can be replaced with other > > >> mature proxy software with SNI routing or multiple advertised > listeners > > >> now. Hence I am afraid that we are taking the wrong direction here. > Here > > >> are various reasons. > > >> > > >> 1) The ProxyService is essentially a Pulsar admin client. Broker > service > > >> also provides a Pulsar admin client. I am not sure how Proxy PH will > > >> simplify the protocol handler development. Please use an example to > > >> demonstrate it. > > >> > > > > > > In the cases I am highlighting, *the Broker is simply not the right > place > > > to run the code*. > > > > > > So the problem here is not to have PulsarAdmin in the Broker on in the > > > Proxy. > > > Is that if you want to write a smart proxy for another protocol: > > > - you end up in copy/pasting the Proxy code > > > - you use the internal Pulsar classes to have a consistent behaviour > with > > > the Pulsar Proxy > > > - you add more components to the "picture" of the Pulsar cluster > > > > > > > > >> 2) The Authorization & Authentication services in ProxyService are > only > > >> used when proxies are configured to use zookeeper for broker > discovery. > > >> However, this option is not recommended when running Pulsar proxies in > > >> Kubernetes. Instead, using a broker discovery service is recommended. > In > > >> order to make PH work, you are forcing proxy to be tight with the > > >> zookeeper. > > >> > > > > > > This is not needed for all of the Proxy PH handlers. > > > But Authorization & Authentication are a core part of this story. > > > If you implement your "smart proxy" somewhere else and not as a Plugin > to > > > the Pulsar Proxy (or Broker) > > > you cannot leverage the same services, the same way. > > > It leads to having more chances of having a behaviour different from > > > standard Pulsar. > > > > > > PH developers are Pulsar experts, and you know that copy pasting code > > from > > > Pulsar, leads to unpredictable behaviour > > > when you run your plugin in another version of Pulsar. > > > But if you use an API that is going to be maintained by Pulsar you are > > > safer and you can think that your code is going to work. > > > > > > > > >> > > >> 3) Configuring authentication and authorization in proxy is already > > >> challenging. There are a few different combinations. A typical Pulsar > > >> setup > > >> is to forward the authentication credentials to the brokers to > > >> authenticate > > >> and authorize. If you don't do this correctly, it will introduce > > security > > >> holes because a connection can potentially grab the superuser > credential > > >> configured in proxy and use superuser credentials to access brokers. > > From > > >> this perspective, I think proxy protocol handler doesn't make things > > >> simpler instead it makes things complicated when it comes to > > >> authentication > > >> and authorization. > > >> > > > > > > Yes, this is a very complex problem indeed. > > > > > > We can help developers by providing a standard framework to access > these > > > services. > > > > > > It is very important from my point of view, that we do not encourage > > > developers to create > > > their own versions of a Pulsar proxy. > > > > > > My recent experience is that we can add many new wire protocols to > Pulsar > > > and this will help a lot with the adoption of Pulsar. > > > > > > As we are doing in many other places on Pulsar we should provide tools > to > > > write extensions > > > and do not let people be too creative. > > > > > > > > >> > > >> I would like to see these questions are answered before moving to a > > vote. > > >> > > > > > > I hope that we can reach consensus on the need of this API. > > > because I see that there is a real need for making this happen. > > > > > > It is the Pulsar momentum now, there are so many opportunities to reach > > > out to users of other systems, > > > let's not waste these opportunities. > > > > > > > > > Enrico > > > > > > > > > > > >> > > >> - Sijie > > >> > > >> > > >> > > >> > > >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <eolive...@gmail.com> > > >> wrote: > > >> > > >> > Any other comment? > > >> > > > >> > I would like to start a VOTE, but I feel we saw too few comments > here > > >> > > > >> > Please take a look. > > >> > I believe it will be a good fit for 2.9.0 release, that is going to > be > > >> > released in the end of September > > >> > > > >> > > > >> > Enrico > > >> > > > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <mikemars...@gmail.com> > ha > > >> > scritto: > > >> > > > >> > > +1, just read through the PIP. Looks good to me. > > >> > > > > >> > > - Michael > > >> > > > > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli < > > eolive...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > Hello Pulsar fellows, > > >> > > > > > >> > > > I have prepared a PIP about adding support for Protocol Handlers > > >> > > > > > >> > > > This is the GDoc > > >> > > > > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing > > >> > > > > > >> > > > > > >> > > > This is the PR for the implementation > > >> > > > https://github.com/apache/pulsar/pull/11838/files > > >> > > > > > >> > > > I am pretty sure that this PIP will make life of developers of > > >> Protocol > > >> > > > Handlers and of Administrators who deploy Protocol Handlers very > > >> nicer > > >> > > > > > >> > > > We are still working on the formal PIP process, at the moment I > am > > >> > > sharing > > >> > > > with you the document. > > >> > > > My understanding is that after the discussion, I will start a > VOTE > > >> > > thread, > > >> > > > and if the VOTE passes we can move forward with reviewing the > PR, > > >> and > > >> > > > hopefully merge this feature for Pulsar 2.9.0 > > >> > > > > > >> > > > Enrico > > >> > > > > > >> > > > > >> > > > >> > > > > > >