Re: PIP-93 Pulsar Proxy Protocol Handlers

Enrico Olivelli Tue, 14 Sep 2021 08:01:44 -0700

other comments ?

Enrico


Il giorno gio 9 set 2021 alle ore 09:15 Enrico Olivelli <[email protected]>
ha scritto:

> Joe,
>
> Il giorno gio 9 set 2021 alle ore 04:31 Joe F <[email protected]> ha
> scritto:
>
>> Enrico, my initial comment  when you brought up PH was in relation to the
>> larger question about proxying, rather than looking at this in a limited
>> fashion on how to  make it easy to add new PH in the proxy.
>>
>> But specifically with this, here are my comments. Two very
>> distinct abstractions are being mixed up here, and I'm not sure
>> whether that is a good idea or not.
>>
>
> One way of seeing this PIP is to simply complete the work initiated with
> PIP-41 (Introduction of Broker PHs,
> https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler
> ).
>
>
>
>>
>> The proxy was designed to move bits and bytes without interpretation,
>> from
>> one network to the another.  The issue with Pulsar  is that  it requires
>> some interpretation of the data to find to which server  a client should
>> connect. .  Protocol translation crept into the proxy, just to be able to
>> ask this question. Since auth is required to answer this question,  auth
>> also crept in.    Essentially the proxy was built as a TCP proxy, not as a
>> wire protocol translator.   Some additional hacky things needed to be done
>> to make it work as a TCP proxy,  and in my opinion those things  should
>> die away to the fullest extent possible
>>
>
> I totally understand this point. I wasn't there when the proxy was born
> but currently
> my experience is that the Proxy is perceived as the primary endpoint in
> front of the Pulsar cluster
> especially when you run in k8s.
>
>
>
>>
>> Because of all this, the current implementation is not ideal.  It's usage
>> is highly restricted in actual deployments, because of potential security
>> risks if the proxy is  misconfigured. One needs to be strict about setting
>> up the proxy  to meet security standards in highly regulated environments.
>>
>>
>>
>> >And we faced the limitation of the need to create a new proxy service for
>> >each new protocol, but all of these "proxy services" have in common
>> >most of the features of the Pulsar proxy.
>> >When we also came to deal with System Architects it was clear the
>> >requirement to have only one single "place" to put all of the
>> interactions
>> >at "cluster level" with Pulsar.
>>
>> Good idea, a single place seems right. Can the proxy answer the traffic
>> routing question without interpreting the data? Essentially, move what is
>> done within the proxy now,  to a well known service within the cluster,
>> and
>> use that ?
>>
>
> In the usecases I know, simply routing PDUs to internal brokers is not
> enough
> but you often need to add complex mapping logic from the External Protocol
> Concepts to Pulsar concepts on the Proxy component.
>
> So you have two ways:
> 1. create your own service and deploy it separately: this was the
> beginning of my work and the same did some colleagues of mine
> 2. deploy your code inside the Pulsar Proxy, and leverage current
> packaging, configuration, tools, security APIs, helm chart.....
>
> I started this discussion because I found option 1 very awkward for Proxy
> Component developers, for System Administrators and for System Architects.
>
> Developers:
> - you have to copy/paste some Pulsar Proxy code, import Proxy jars, use
> internal Pulsar classes to implement Authentication, Authorization, Service
> Discovery., Configuration...
>
> System Administrators:
> - you have a new set of configuration files and tools to manage the
> settings (and in k8s you have to modify the Helm Chart significantly)
>
> System Architects:
> - you have multiple new components in the pictures, to explain, to
> justify.....
>
> With this proposal:
>
> Developers:
> - use a framework, do not reinvent the wheel, be able to ensure that you
> are compatible with a give Pulsar version, ensure that the behaviour is
> consistent with other Pulsar components (like using ProxyConfiguration, or
> the same service lifecycle, same libs) you can evolve more easily
>
> System Administrator:
> - you use proxy.conf/broker.conf, you use Pulsar CLI tools, no need to
> change the Helm Charts
>
> System Architects:
> - nothing new in the table, every Pulsar docs applies, you have the Proxy
> that deals with external clients, but it is able to speak Pulsar, Kafka,
> RabbitMQ, MQTT, ActiveMQ
>
>
>
>
>>
>> >I think this is a good picture of what I mean:
>> >- PH in the Broker -> add protocols inside the Broker, work for owned
>> topics
>> >- PH in the Proxy -> add protocols in front of the whole Cluster
>> >There is a good amount of processing that should be executed on the
>> proxy,
>> >and it is not good to run it on a broker.
>>
>>  Is a TCP proxy a good place to do wire protocol translation
>> (computation)?
>> Especially if that translation is a good amount of processing?  if it's
>> not
>> good to run this much processing on the broker, then it's even worse to
>> run
>> it on a network proxy. I can foresee this as a path that will lead to
>> cluster and load management creeping into the proxy, as soon as you move
>> beyond what a single proxy can handle.
>>
>> But I think these issues (of n/w vs protocol translation) are moot when
>> you
>> look at the larger needs of  generic proxy that will support ingress,
>> configurable protocol handlers, load balancing etc for use with Pulsar.
>> You
>> can run a bunch of Pulsar's  proxies today, and there is no means to
>> manage
>> them properly. eg: load balance between them/ manage them as a cluster/
>> have affinity of proxies to topics/tenants. etc. This applies even before
>> this PIP (and more so once you add more processing into the proxy).
>>
>> The Pulsar proxy, as it is,  is not amenable to creating anything like a
>> service mesh. It would demand a lot of work in the proxy. Hence my
>> initial comment about the proxy eventually becoming a mudball, and why we
>> should rethink this entire proxy.
>>
>>  It is tempting to evolve the Pulsar proxy into a service that supports
>> everything.. ingress, transformation chains, cluster management  etc .
>> This  will eventually end up  duplicating something which already exists
>> elsewhere.  My take is that this is better done by building on top of
>> something like envoy ( or similar) which has built in and mature
>> features,
>> and supported by a wide user base.
>>
>
> Unfortunately general purpose proxies or proxies specific to some protocol
> will not be able to
> do efficiently what we can do using Pulsar APIs, because they cannot "map"
> directly External Concepts to the Pulsar model.
>
> I cannot imagine the cost of developing and maintaining a plugin for Envoy
> that is able to deal
> with Pulsar concepts. For instance it is not written in Java and you
> cannot use Java Bindings for Pulsar, that are feature complete and always
> up-to-date with latest features.
> Also developers that work on PHs are specialized in Pulsar code and in
> Java (at very high levels), and so for them it is harder to write super
> efficient and high quality plugins using non-Java languages.
>
> So I see a huge value in adding this ability to the Pulsar Proxy.
>
> The only alternative to this PIP is to create a new framework for creating
> such "Smart Proxies" in Java and using some official/maintained Pulsar API.
>
> So we will end up discussing the value of adding such a brand new module,
> and how to deploy/manage it.
>
> It is a huge cost and it will take so much time:
> - design,
> - adding new concepts to the architecture,
> - adding a new service (new management tools),
> - lot of new code (probably cut/paste from Pulsar Proxy)
> - helm chart
> - new configuration files
> - docs
>
> I believe that we should spend our time in adding more bindings/protocol
> handlers instead of doing that.
>
> By the way I will be happy to drive this new effort if this is REALLY what
> we want.
>
> So I am convinced that for the short/mid term this PIP is the best choice
> to help Pulsar adoption.
>
> This PIP will unlock some great potential that otherwise will be
> available only to users of custom tools, not officially maintained
> inside the Pulsar project.
> I will be very sad about the outcome
>
>
>
> Enrico
>
>
>
>>
>> -j
>>
>> On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli <[email protected]>
>> wrote:
>>
>> > (ping)
>> >
>> >
>> > Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli <
>> > [email protected]>
>> > ha scritto:
>> >
>> > > Sijie,
>> > > Thanks for your questions, answers inline below.
>> > >
>> > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <[email protected]
>> >
>> > ha
>> > > scritto:
>> > >
>> > >> I would like to see the clarification between the broker protocol
>> > handlers
>> > >> and proxy protocol handlers before moving it to a vote thread.
>> > >>
>> > >
>> > > A PH in the broker is very useful as it allows you to directly access
>> the
>> > > ManagedLedger and implement high performance adapters for
>> > > other wire protocols.
>> > > The bigger limitation is that you can access efficiently only the
>> topics
>> > > owned by the local broker.
>> > > If you try to forward/proxy the request to another broker (you can do
>> it,
>> > > and this was Matteo's suggestion at the latest Video Community
>> meeting)
>> > > you have the downside that the broker has to waste resources to do the
>> > > "proxy work"
>> > > and you generally want a broker machine to be used only to deal with
>> the
>> > > local traffic.
>> > >
>> > > The load balancing mechanism of the brokers is not meant to deal with
>> > > additional work due to proxying requests related to the topics for
>> which
>> > > the broker is not owner.
>> > >
>> > > A PH in the proxy is useful to add new protocols that are running in
>> > front
>> > > of the whole cluster and not only of one single broker.
>> > > This is a very different use case in respect to having the PH in
>> broker.
>> > >
>> > > The work of the proxy usually is to forward requests to the internal
>> > > services of the cluster, and in case of new protocols in the proxy
>> > > you need some logic to fill in the gaps in the original wireprotocol.
>> > >
>> > > System architects expect a different kind of load on the proxy and
>> other
>> > > kinds of load on the brokers.
>> > > For instance you usually can run very few proxies to cover a big
>> cluster
>> > > with many brokers.
>> > > So adding a PH on all the brokers is sometimes overkilling.
>> > >
>> > >
>> > >>
>> > >> I can see how it will cause confusion for protocol developers.
>> > >>
>> > >
>> > > Protocol developers are very advanced users that do need to understand
>> > > clearly the internals of Pulsar.
>> > > In fact this request of having PHs in the Proxy layer came from myself
>> > and
>> > > from other colleagues of mine who are working heavily in implementing
>> > > new protocol handlers in Pulsar.
>> > >
>> > > And we faced the limitation of the need to create a new proxy service
>> for
>> > > each new protocol, but all of these "proxy services" have in common
>> > > most of the features of the Pulsar proxy.
>> > > When we also came to deal with System Architects it was clear the
>> > > requirement to have only one single "place" to put all of the
>> > interactions
>> > > at "cluster level" with Pulsar.
>> > >
>> > > I think this is a good picture of what I mean:
>> > > - PH in the Broker -> add protocols inside the Broker, work for owned
>> > > topics
>> > > - PH in the Proxy -> add protocols in front of the whole Cluster
>> > >
>> > >
>> > >> Yunze brought a good idea on KoP.
>> > >
>> > >
>> > > I also have good ideas and working solutions for a Pulsar-proxy like
>> KOP
>> > > Proxy.
>> > > I will be happy to discuss this in a separate thread or at a separate
>> > > table with Yunze.
>> > >
>> > > A smart KOP proxy can work if you run inside the Pulsar proxy process
>> or
>> > > you can copy/paste the Pulsar Proxy code and create another service.
>> > >
>> > >
>> > >> But I don't think that's the right
>> > >> direction. If you can give an example of the usage of a proxy handler
>> > and
>> > >> how it is different from using a broker handler, that would help me
>> > >> understand this PIP.
>> > >>
>> > >
>> > > For some protocols you have to execute some non trivial work for
>> mapping
>> > > the wireprotocol and the concepts of the protocol to the Pulsar model.
>> > > For instance some protocols do not have the concept of "lookup", and
>> the
>> > > proxy does the lookup and forwards the request to the internal broker.
>> > >
>> > > For some protocols you can just use the PulsarClient to connect to the
>> > > internal brokers, you do not need and you do not want to access the
>> > > ManagedLedgers:
>> > > in this case adding the execution inside the broker is only
>> complicating
>> > > the overall design of the system and putting load on the brokers.
>> > >
>> > > There is a good amount of processing that should be executed on the
>> > proxy,
>> > > and it is not good to run it on a broker.
>> > > If you do not put the "custom code" in the Proxy and you can only
>> write a
>> > > Broker PH you end up in adding it to the Broker.
>> > >
>> > > If you expose directly (with some LoadBalancer or whatever) your
>> brokers
>> > > in which you run the PH code that you would put in the proxy
>> > > you end up in putting on the broker some load that is not expected:
>> > > - the broker will have to work even for topics for which it is not the
>> > > owner
>> > > - the broker will have to do things that cannot be dealt correctly by
>> the
>> > > Pulsar load balancer (because it expects that the load it
>> proportional to
>> > > the owned bundles)
>> > >
>> > >
>> > >>
>> > >> The reason why Pulsar proxy is built is to have a "smart" proxy that
>> is
>> > >> aware of Pulsar protocol. The Pulsar proxy can be replaced with other
>> > >> mature proxy software with SNI routing or multiple advertised
>> listeners
>> > >> now. Hence I am afraid that we are taking the wrong direction here.
>> Here
>> > >> are various reasons.
>> > >>
>> > >> 1) The ProxyService is essentially a Pulsar admin client. Broker
>> service
>> > >> also provides a Pulsar admin client. I am not sure how Proxy PH will
>> > >> simplify the protocol handler development. Please use an example to
>> > >> demonstrate it.
>> > >>
>> > >
>> > > In the cases I am highlighting, *the Broker is simply not the right
>> place
>> > > to run the code*.
>> > >
>> > > So the problem here is not to have PulsarAdmin in the Broker on in the
>> > > Proxy.
>> > > Is that if you want to write a smart proxy for another protocol:
>> > > - you end up in copy/pasting the Proxy code
>> > > - you use the internal Pulsar classes to have a consistent behaviour
>> with
>> > > the Pulsar Proxy
>> > > - you add more components to the "picture" of the Pulsar cluster
>> > >
>> > >
>> > >> 2) The Authorization & Authentication services in ProxyService are
>> only
>> > >> used when proxies are configured to use zookeeper for broker
>> discovery.
>> > >> However, this option is not recommended when running Pulsar proxies
>> in
>> > >> Kubernetes. Instead, using a broker discovery service is
>> recommended. In
>> > >> order to make PH work, you are forcing proxy to be tight with the
>> > >> zookeeper.
>> > >>
>> > >
>> > > This is not needed for all of the Proxy PH handlers.
>> > > But Authorization & Authentication  are a core part of this story.
>> > > If you implement your "smart proxy" somewhere else and not as a
>> Plugin to
>> > > the Pulsar Proxy (or Broker)
>> > > you cannot leverage the same services, the same way.
>> > > It leads to having more chances of having a behaviour different from
>> > > standard Pulsar.
>> > >
>> > > PH developers are Pulsar experts, and you know that copy pasting code
>> > from
>> > > Pulsar, leads to unpredictable behaviour
>> > > when you run your plugin in another version of Pulsar.
>> > > But if you use an API that is going to be maintained by Pulsar you are
>> > > safer and you can think that your code is going to work.
>> > >
>> > >
>> > >>
>> > >> 3) Configuring authentication and authorization in proxy is already
>> > >> challenging. There are a few different combinations. A typical Pulsar
>> > >> setup
>> > >> is to forward the authentication credentials to the brokers to
>> > >> authenticate
>> > >> and authorize. If you don't do this correctly, it will introduce
>> > security
>> > >> holes because a connection can potentially grab the superuser
>> credential
>> > >> configured in proxy and use superuser credentials to access brokers.
>> > From
>> > >> this perspective, I think proxy protocol handler doesn't make things
>> > >> simpler instead it makes things complicated when it comes to
>> > >> authentication
>> > >> and authorization.
>> > >>
>> > >
>> > > Yes, this is a very complex problem indeed.
>> > >
>> > > We can help developers by providing a standard framework to access
>> these
>> > > services.
>> > >
>> > > It is very important from my point of view, that we do not encourage
>> > > developers to create
>> > > their own versions of a Pulsar proxy.
>> > >
>> > > My recent experience is that we can add many new wire protocols to
>> Pulsar
>> > > and this will help a lot with the adoption of Pulsar.
>> > >
>> > > As we are doing in many other places on Pulsar we should provide
>> tools to
>> > > write extensions
>> > > and do not let people be too creative.
>> > >
>> > >
>> > >>
>> > >> I would like to see these questions are answered before moving to a
>> > vote.
>> > >>
>> > >
>> > > I hope that we can reach consensus on the need of this API.
>> > > because I see that there is a real need for making this happen.
>> > >
>> > > It is the Pulsar momentum now, there are so many opportunities to
>> reach
>> > > out to users of other systems,
>> > > let's not waste these opportunities.
>> > >
>> > >
>> > > Enrico
>> > >
>> > >
>> > >
>> > >>
>> > >> - Sijie
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <[email protected]
>> >
>> > >> wrote:
>> > >>
>> > >> > Any other comment?
>> > >> >
>> > >> > I would like to start a VOTE, but I feel we saw too few comments
>> here
>> > >> >
>> > >> > Please take a look.
>> > >> > I believe it will be a good fit for 2.9.0 release, that is going
>> to be
>> > >> > released in the end of September
>> > >> >
>> > >> >
>> > >> > Enrico
>> > >> >
>> > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <[email protected]>
>> ha
>> > >> > scritto:
>> > >> >
>> > >> > > +1, just read through the PIP. Looks good to me.
>> > >> > >
>> > >> > > - Michael
>> > >> > >
>> > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <
>> > [email protected]>
>> > >> > > wrote:
>> > >> > >
>> > >> > > > Hello Pulsar fellows,
>> > >> > > >
>> > >> > > > I have prepared a PIP about adding support for Protocol
>> Handlers
>> > >> > > >
>> > >> > > > This is the GDoc
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing
>> > >> > > >
>> > >> > > >
>> > >> > > > This is the PR for the implementation
>> > >> > > > https://github.com/apache/pulsar/pull/11838/files
>> > >> > > >
>> > >> > > > I am pretty sure that this PIP will make life of developers of
>> > >> Protocol
>> > >> > > > Handlers and of Administrators who deploy Protocol Handlers
>> very
>> > >> nicer
>> > >> > > >
>> > >> > > > We are still working on the formal PIP process, at the moment
>> I am
>> > >> > > sharing
>> > >> > > > with you the document.
>> > >> > > > My understanding is that after the discussion, I will start a
>> VOTE
>> > >> > > thread,
>> > >> > > > and if the VOTE passes we can move forward with reviewing the
>> PR,
>> > >> and
>> > >> > > > hopefully merge this feature for Pulsar 2.9.0
>> > >> > > >
>> > >> > > > Enrico
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: PIP-93 Pulsar Proxy Protocol Handlers

Reply via email to