other comments ? Enrico
Il giorno gio 9 set 2021 alle ore 09:15 Enrico Olivelli <eolive...@gmail.com> ha scritto: > Joe, > > Il giorno gio 9 set 2021 alle ore 04:31 Joe F <joefranc...@gmail.com> ha > scritto: > >> Enrico, my initial comment when you brought up PH was in relation to the >> larger question about proxying, rather than looking at this in a limited >> fashion on how to make it easy to add new PH in the proxy. >> >> But specifically with this, here are my comments. Two very >> distinct abstractions are being mixed up here, and I'm not sure >> whether that is a good idea or not. >> > > One way of seeing this PIP is to simply complete the work initiated with > PIP-41 (Introduction of Broker PHs, > https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler > ). > > > >> >> The proxy was designed to move bits and bytes without interpretation, >> from >> one network to the another. The issue with Pulsar is that it requires >> some interpretation of the data to find to which server a client should >> connect. . Protocol translation crept into the proxy, just to be able to >> ask this question. Since auth is required to answer this question, auth >> also crept in. Essentially the proxy was built as a TCP proxy, not as a >> wire protocol translator. Some additional hacky things needed to be done >> to make it work as a TCP proxy, and in my opinion those things should >> die away to the fullest extent possible >> > > I totally understand this point. I wasn't there when the proxy was born > but currently > my experience is that the Proxy is perceived as the primary endpoint in > front of the Pulsar cluster > especially when you run in k8s. > > > >> >> Because of all this, the current implementation is not ideal. It's usage >> is highly restricted in actual deployments, because of potential security >> risks if the proxy is misconfigured. One needs to be strict about setting >> up the proxy to meet security standards in highly regulated environments. >> >> >> >> >And we faced the limitation of the need to create a new proxy service for >> >each new protocol, but all of these "proxy services" have in common >> >most of the features of the Pulsar proxy. >> >When we also came to deal with System Architects it was clear the >> >requirement to have only one single "place" to put all of the >> interactions >> >at "cluster level" with Pulsar. >> >> Good idea, a single place seems right. Can the proxy answer the traffic >> routing question without interpreting the data? Essentially, move what is >> done within the proxy now, to a well known service within the cluster, >> and >> use that ? >> > > In the usecases I know, simply routing PDUs to internal brokers is not > enough > but you often need to add complex mapping logic from the External Protocol > Concepts to Pulsar concepts on the Proxy component. > > So you have two ways: > 1. create your own service and deploy it separately: this was the > beginning of my work and the same did some colleagues of mine > 2. deploy your code inside the Pulsar Proxy, and leverage current > packaging, configuration, tools, security APIs, helm chart..... > > I started this discussion because I found option 1 very awkward for Proxy > Component developers, for System Administrators and for System Architects. > > Developers: > - you have to copy/paste some Pulsar Proxy code, import Proxy jars, use > internal Pulsar classes to implement Authentication, Authorization, Service > Discovery., Configuration... > > System Administrators: > - you have a new set of configuration files and tools to manage the > settings (and in k8s you have to modify the Helm Chart significantly) > > System Architects: > - you have multiple new components in the pictures, to explain, to > justify..... > > With this proposal: > > Developers: > - use a framework, do not reinvent the wheel, be able to ensure that you > are compatible with a give Pulsar version, ensure that the behaviour is > consistent with other Pulsar components (like using ProxyConfiguration, or > the same service lifecycle, same libs) you can evolve more easily > > System Administrator: > - you use proxy.conf/broker.conf, you use Pulsar CLI tools, no need to > change the Helm Charts > > System Architects: > - nothing new in the table, every Pulsar docs applies, you have the Proxy > that deals with external clients, but it is able to speak Pulsar, Kafka, > RabbitMQ, MQTT, ActiveMQ > > > > >> >> >I think this is a good picture of what I mean: >> >- PH in the Broker -> add protocols inside the Broker, work for owned >> topics >> >- PH in the Proxy -> add protocols in front of the whole Cluster >> >There is a good amount of processing that should be executed on the >> proxy, >> >and it is not good to run it on a broker. >> >> Is a TCP proxy a good place to do wire protocol translation >> (computation)? >> Especially if that translation is a good amount of processing? if it's >> not >> good to run this much processing on the broker, then it's even worse to >> run >> it on a network proxy. I can foresee this as a path that will lead to >> cluster and load management creeping into the proxy, as soon as you move >> beyond what a single proxy can handle. >> >> But I think these issues (of n/w vs protocol translation) are moot when >> you >> look at the larger needs of generic proxy that will support ingress, >> configurable protocol handlers, load balancing etc for use with Pulsar. >> You >> can run a bunch of Pulsar's proxies today, and there is no means to >> manage >> them properly. eg: load balance between them/ manage them as a cluster/ >> have affinity of proxies to topics/tenants. etc. This applies even before >> this PIP (and more so once you add more processing into the proxy). >> >> The Pulsar proxy, as it is, is not amenable to creating anything like a >> service mesh. It would demand a lot of work in the proxy. Hence my >> initial comment about the proxy eventually becoming a mudball, and why we >> should rethink this entire proxy. >> >> It is tempting to evolve the Pulsar proxy into a service that supports >> everything.. ingress, transformation chains, cluster management etc . >> This will eventually end up duplicating something which already exists >> elsewhere. My take is that this is better done by building on top of >> something like envoy ( or similar) which has built in and mature >> features, >> and supported by a wide user base. >> > > Unfortunately general purpose proxies or proxies specific to some protocol > will not be able to > do efficiently what we can do using Pulsar APIs, because they cannot "map" > directly External Concepts to the Pulsar model. > > I cannot imagine the cost of developing and maintaining a plugin for Envoy > that is able to deal > with Pulsar concepts. For instance it is not written in Java and you > cannot use Java Bindings for Pulsar, that are feature complete and always > up-to-date with latest features. > Also developers that work on PHs are specialized in Pulsar code and in > Java (at very high levels), and so for them it is harder to write super > efficient and high quality plugins using non-Java languages. > > So I see a huge value in adding this ability to the Pulsar Proxy. > > The only alternative to this PIP is to create a new framework for creating > such "Smart Proxies" in Java and using some official/maintained Pulsar API. > > So we will end up discussing the value of adding such a brand new module, > and how to deploy/manage it. > > It is a huge cost and it will take so much time: > - design, > - adding new concepts to the architecture, > - adding a new service (new management tools), > - lot of new code (probably cut/paste from Pulsar Proxy) > - helm chart > - new configuration files > - docs > > I believe that we should spend our time in adding more bindings/protocol > handlers instead of doing that. > > By the way I will be happy to drive this new effort if this is REALLY what > we want. > > So I am convinced that for the short/mid term this PIP is the best choice > to help Pulsar adoption. > > This PIP will unlock some great potential that otherwise will be > available only to users of custom tools, not officially maintained > inside the Pulsar project. > I will be very sad about the outcome > > > > Enrico > > > >> >> -j >> >> On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > (ping) >> > >> > >> > Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli < >> > eolive...@gmail.com> >> > ha scritto: >> > >> > > Sijie, >> > > Thanks for your questions, answers inline below. >> > > >> > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <guosi...@gmail.com >> > >> > ha >> > > scritto: >> > > >> > >> I would like to see the clarification between the broker protocol >> > handlers >> > >> and proxy protocol handlers before moving it to a vote thread. >> > >> >> > > >> > > A PH in the broker is very useful as it allows you to directly access >> the >> > > ManagedLedger and implement high performance adapters for >> > > other wire protocols. >> > > The bigger limitation is that you can access efficiently only the >> topics >> > > owned by the local broker. >> > > If you try to forward/proxy the request to another broker (you can do >> it, >> > > and this was Matteo's suggestion at the latest Video Community >> meeting) >> > > you have the downside that the broker has to waste resources to do the >> > > "proxy work" >> > > and you generally want a broker machine to be used only to deal with >> the >> > > local traffic. >> > > >> > > The load balancing mechanism of the brokers is not meant to deal with >> > > additional work due to proxying requests related to the topics for >> which >> > > the broker is not owner. >> > > >> > > A PH in the proxy is useful to add new protocols that are running in >> > front >> > > of the whole cluster and not only of one single broker. >> > > This is a very different use case in respect to having the PH in >> broker. >> > > >> > > The work of the proxy usually is to forward requests to the internal >> > > services of the cluster, and in case of new protocols in the proxy >> > > you need some logic to fill in the gaps in the original wireprotocol. >> > > >> > > System architects expect a different kind of load on the proxy and >> other >> > > kinds of load on the brokers. >> > > For instance you usually can run very few proxies to cover a big >> cluster >> > > with many brokers. >> > > So adding a PH on all the brokers is sometimes overkilling. >> > > >> > > >> > >> >> > >> I can see how it will cause confusion for protocol developers. >> > >> >> > > >> > > Protocol developers are very advanced users that do need to understand >> > > clearly the internals of Pulsar. >> > > In fact this request of having PHs in the Proxy layer came from myself >> > and >> > > from other colleagues of mine who are working heavily in implementing >> > > new protocol handlers in Pulsar. >> > > >> > > And we faced the limitation of the need to create a new proxy service >> for >> > > each new protocol, but all of these "proxy services" have in common >> > > most of the features of the Pulsar proxy. >> > > When we also came to deal with System Architects it was clear the >> > > requirement to have only one single "place" to put all of the >> > interactions >> > > at "cluster level" with Pulsar. >> > > >> > > I think this is a good picture of what I mean: >> > > - PH in the Broker -> add protocols inside the Broker, work for owned >> > > topics >> > > - PH in the Proxy -> add protocols in front of the whole Cluster >> > > >> > > >> > >> Yunze brought a good idea on KoP. >> > > >> > > >> > > I also have good ideas and working solutions for a Pulsar-proxy like >> KOP >> > > Proxy. >> > > I will be happy to discuss this in a separate thread or at a separate >> > > table with Yunze. >> > > >> > > A smart KOP proxy can work if you run inside the Pulsar proxy process >> or >> > > you can copy/paste the Pulsar Proxy code and create another service. >> > > >> > > >> > >> But I don't think that's the right >> > >> direction. If you can give an example of the usage of a proxy handler >> > and >> > >> how it is different from using a broker handler, that would help me >> > >> understand this PIP. >> > >> >> > > >> > > For some protocols you have to execute some non trivial work for >> mapping >> > > the wireprotocol and the concepts of the protocol to the Pulsar model. >> > > For instance some protocols do not have the concept of "lookup", and >> the >> > > proxy does the lookup and forwards the request to the internal broker. >> > > >> > > For some protocols you can just use the PulsarClient to connect to the >> > > internal brokers, you do not need and you do not want to access the >> > > ManagedLedgers: >> > > in this case adding the execution inside the broker is only >> complicating >> > > the overall design of the system and putting load on the brokers. >> > > >> > > There is a good amount of processing that should be executed on the >> > proxy, >> > > and it is not good to run it on a broker. >> > > If you do not put the "custom code" in the Proxy and you can only >> write a >> > > Broker PH you end up in adding it to the Broker. >> > > >> > > If you expose directly (with some LoadBalancer or whatever) your >> brokers >> > > in which you run the PH code that you would put in the proxy >> > > you end up in putting on the broker some load that is not expected: >> > > - the broker will have to work even for topics for which it is not the >> > > owner >> > > - the broker will have to do things that cannot be dealt correctly by >> the >> > > Pulsar load balancer (because it expects that the load it >> proportional to >> > > the owned bundles) >> > > >> > > >> > >> >> > >> The reason why Pulsar proxy is built is to have a "smart" proxy that >> is >> > >> aware of Pulsar protocol. The Pulsar proxy can be replaced with other >> > >> mature proxy software with SNI routing or multiple advertised >> listeners >> > >> now. Hence I am afraid that we are taking the wrong direction here. >> Here >> > >> are various reasons. >> > >> >> > >> 1) The ProxyService is essentially a Pulsar admin client. Broker >> service >> > >> also provides a Pulsar admin client. I am not sure how Proxy PH will >> > >> simplify the protocol handler development. Please use an example to >> > >> demonstrate it. >> > >> >> > > >> > > In the cases I am highlighting, *the Broker is simply not the right >> place >> > > to run the code*. >> > > >> > > So the problem here is not to have PulsarAdmin in the Broker on in the >> > > Proxy. >> > > Is that if you want to write a smart proxy for another protocol: >> > > - you end up in copy/pasting the Proxy code >> > > - you use the internal Pulsar classes to have a consistent behaviour >> with >> > > the Pulsar Proxy >> > > - you add more components to the "picture" of the Pulsar cluster >> > > >> > > >> > >> 2) The Authorization & Authentication services in ProxyService are >> only >> > >> used when proxies are configured to use zookeeper for broker >> discovery. >> > >> However, this option is not recommended when running Pulsar proxies >> in >> > >> Kubernetes. Instead, using a broker discovery service is >> recommended. In >> > >> order to make PH work, you are forcing proxy to be tight with the >> > >> zookeeper. >> > >> >> > > >> > > This is not needed for all of the Proxy PH handlers. >> > > But Authorization & Authentication are a core part of this story. >> > > If you implement your "smart proxy" somewhere else and not as a >> Plugin to >> > > the Pulsar Proxy (or Broker) >> > > you cannot leverage the same services, the same way. >> > > It leads to having more chances of having a behaviour different from >> > > standard Pulsar. >> > > >> > > PH developers are Pulsar experts, and you know that copy pasting code >> > from >> > > Pulsar, leads to unpredictable behaviour >> > > when you run your plugin in another version of Pulsar. >> > > But if you use an API that is going to be maintained by Pulsar you are >> > > safer and you can think that your code is going to work. >> > > >> > > >> > >> >> > >> 3) Configuring authentication and authorization in proxy is already >> > >> challenging. There are a few different combinations. A typical Pulsar >> > >> setup >> > >> is to forward the authentication credentials to the brokers to >> > >> authenticate >> > >> and authorize. If you don't do this correctly, it will introduce >> > security >> > >> holes because a connection can potentially grab the superuser >> credential >> > >> configured in proxy and use superuser credentials to access brokers. >> > From >> > >> this perspective, I think proxy protocol handler doesn't make things >> > >> simpler instead it makes things complicated when it comes to >> > >> authentication >> > >> and authorization. >> > >> >> > > >> > > Yes, this is a very complex problem indeed. >> > > >> > > We can help developers by providing a standard framework to access >> these >> > > services. >> > > >> > > It is very important from my point of view, that we do not encourage >> > > developers to create >> > > their own versions of a Pulsar proxy. >> > > >> > > My recent experience is that we can add many new wire protocols to >> Pulsar >> > > and this will help a lot with the adoption of Pulsar. >> > > >> > > As we are doing in many other places on Pulsar we should provide >> tools to >> > > write extensions >> > > and do not let people be too creative. >> > > >> > > >> > >> >> > >> I would like to see these questions are answered before moving to a >> > vote. >> > >> >> > > >> > > I hope that we can reach consensus on the need of this API. >> > > because I see that there is a real need for making this happen. >> > > >> > > It is the Pulsar momentum now, there are so many opportunities to >> reach >> > > out to users of other systems, >> > > let's not waste these opportunities. >> > > >> > > >> > > Enrico >> > > >> > > >> > > >> > >> >> > >> - Sijie >> > >> >> > >> >> > >> >> > >> >> > >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <eolive...@gmail.com >> > >> > >> wrote: >> > >> >> > >> > Any other comment? >> > >> > >> > >> > I would like to start a VOTE, but I feel we saw too few comments >> here >> > >> > >> > >> > Please take a look. >> > >> > I believe it will be a good fit for 2.9.0 release, that is going >> to be >> > >> > released in the end of September >> > >> > >> > >> > >> > >> > Enrico >> > >> > >> > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <mikemars...@gmail.com> >> ha >> > >> > scritto: >> > >> > >> > >> > > +1, just read through the PIP. Looks good to me. >> > >> > > >> > >> > > - Michael >> > >> > > >> > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli < >> > eolive...@gmail.com> >> > >> > > wrote: >> > >> > > >> > >> > > > Hello Pulsar fellows, >> > >> > > > >> > >> > > > I have prepared a PIP about adding support for Protocol >> Handlers >> > >> > > > >> > >> > > > This is the GDoc >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > >> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing >> > >> > > > >> > >> > > > >> > >> > > > This is the PR for the implementation >> > >> > > > https://github.com/apache/pulsar/pull/11838/files >> > >> > > > >> > >> > > > I am pretty sure that this PIP will make life of developers of >> > >> Protocol >> > >> > > > Handlers and of Administrators who deploy Protocol Handlers >> very >> > >> nicer >> > >> > > > >> > >> > > > We are still working on the formal PIP process, at the moment >> I am >> > >> > > sharing >> > >> > > > with you the document. >> > >> > > > My understanding is that after the discussion, I will start a >> VOTE >> > >> > > thread, >> > >> > > > and if the VOTE passes we can move forward with reviewing the >> PR, >> > >> and >> > >> > > > hopefully merge this feature for Pulsar 2.9.0 >> > >> > > > >> > >> > > > Enrico >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > >> >