Re: PIP-93 Pulsar Proxy Protocol Handlers

Sijie Guo Wed, 08 Sep 2021 02:32:59 -0700

On Fri, Sep 3, 2021 at 5:07 AM Enrico Olivelli <[email protected]> wrote:


> Sijie,
> Thanks for your questions, answers inline below.
>
> Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <[email protected]> ha
> scritto:
>
> > I would like to see the clarification between the broker protocol
> handlers
> > and proxy protocol handlers before moving it to a vote thread.
> >
>
> A PH in the broker is very useful as it allows you to directly access the
> ManagedLedger and implement high performance adapters for
> other wire protocols.
> The bigger limitation is that you can access efficiently only the topics
> owned by the local broker.
> If you try to forward/proxy the request to another broker (you can do it,
> and this was Matteo's suggestion at the latest Video Community meeting)
> you have the downside that the broker has to waste resources to do the
> "proxy work"
> and you generally want a broker machine to be used only to deal with the
> local traffic.
>
> The load balancing mechanism of the brokers is not meant to deal with
> additional work due to proxying requests related to the topics for which
> the broker is not owner.
>
> A PH in the proxy is useful to add new protocols that are running in front
> of the whole cluster and not only of one single broker.
> This is a very different use case in respect to having the PH in broker.
>
> The work of the proxy usually is to forward requests to the internal
> services of the cluster, and in case of new protocols in the proxy
> you need some logic to fill in the gaps in the original wireprotocol.
>
> System architects expect a different kind of load on the proxy and other
> kinds of load on the brokers.
> For instance you usually can run very few proxies to cover a big cluster
> with many brokers.
> So adding a PH on all the brokers is sometimes overkilling.
>

Why not using a mature "proxy" solution like Nginx, Envoy, and etc. If the
"proxy" is doing smart routing, that software provide solutions for doing
this job in a very efficient way comparing to using a JVM-based solution.


>
>
> >
> > I can see how it will cause confusion for protocol developers.
> >
>
> Protocol developers are very advanced users that do need to understand
> clearly the internals of Pulsar.
> In fact this request of having PHs in the Proxy layer came from myself and
> from other colleagues of mine who are working heavily in implementing
> new protocol handlers in Pulsar.
>
> And we faced the limitation of the need to create a new proxy service for
> each new protocol, but all of these "proxy services" have in common
> most of the features of the Pulsar proxy.
> When we also came to deal with System Architects it was clear the
> requirement to have only one single "place" to put all of the interactions
> at "cluster level" with Pulsar.
>
> I think this is a good picture of what I mean:
> - PH in the Broker -> add protocols inside the Broker, work for owned
> topics
> - PH in the Proxy -> add protocols in front of the whole Cluster
>

Why NOT just add an ingress service for broker statefullset?

The fundamental difference between messaging protocols is whether there is
a redirection protocol in them.

If there is a redirection protocol (like Pulsar and Kafka), you redirect
the requests. For such protocols, you need an additional "proxy" solution.
These can be addressed by using mature solutions like Envoy. For example,
in Kafka world, there were already very mature solutions employed by
Strimzi operator and Banzai Cloud operator. I don't see a value to
re-invent another solution.

If there is no redirection protocol, a service for brokers would be
sufficient. But if you want to introduce a proxy, you can still use
solutions like Envoy not rebuilding another solution.


>
>
> > Yunze brought a good idea on KoP.
>
>
> I also have good ideas and working solutions for a Pulsar-proxy like KOP
> Proxy.
> I will be happy to discuss this in a separate thread or at a separate table
> with Yunze.
>
> A smart KOP proxy can work if you run inside the Pulsar proxy process or
> you can copy/paste the Pulsar Proxy code and create another service.
>

I have provided my feedback on the KOP proxy. Please see my comment:
https://github.com/streamnative/kop/issues/717#issuecomment-915063387


>
>
> > But I don't think that's the right
> > direction. If you can give an example of the usage of a proxy handler and
> > how it is different from using a broker handler, that would help me
> > understand this PIP.
> >
>
> For some protocols you have to execute some non trivial work for mapping
> the wireprotocol and the concepts of the protocol to the Pulsar model.
> For instance some protocols do not have the concept of "lookup", and the
> proxy does the lookup and forwards the request to the internal broker.
>
> For some protocols you can just use the PulsarClient to connect to the
> internal brokers, you do not need and you do not want to access the
> ManagedLedgers:
> in this case adding the execution inside the broker is only complicating
> the overall design of the system and putting load on the brokers.
>
> There is a good amount of processing that should be executed on the proxy,
> and it is not good to run it on a broker.
> If you do not put the "custom code" in the Proxy and you can only write a
> Broker PH you end up in adding it to the Broker.
>
> If you expose directly (with some LoadBalancer or whatever) your brokers in
> which you run the PH code that you would put in the proxy
> you end up in putting on the broker some load that is not expected:
> - the broker will have to work even for topics for which it is not the
> owner
> - the broker will have to do things that cannot be dealt correctly by the
> Pulsar load balancer (because it expects that the load it proportional to
> the owned bundles)
>

Why not build a filter in Envoy? Envoy is the de-factor "proxy" for
Kubernetes.


>
>
> >
> > The reason why Pulsar proxy is built is to have a "smart" proxy that is
> > aware of Pulsar protocol. The Pulsar proxy can be replaced with other
> > mature proxy software with SNI routing or multiple advertised listeners
> > now. Hence I am afraid that we are taking the wrong direction here. Here
> > are various reasons.
> >
> > 1) The ProxyService is essentially a Pulsar admin client. Broker service
> > also provides a Pulsar admin client. I am not sure how Proxy PH will
> > simplify the protocol handler development. Please use an example to
> > demonstrate it.
> >
>
> In the cases I am highlighting, *the Broker is simply not the right place
> to run the code*.
>
> So the problem here is not to have PulsarAdmin in the Broker on in the
> Proxy.
> Is that if you want to write a smart proxy for another protocol:
> - you end up in copy/pasting the Proxy code
> - you use the internal Pulsar classes to have a consistent behaviour with
> the Pulsar Proxy
> - you add more components to the "picture" of the Pulsar cluster
>
>
> > 2) The Authorization & Authentication services in ProxyService are only
> > used when proxies are configured to use zookeeper for broker discovery.
> > However, this option is not recommended when running Pulsar proxies in
> > Kubernetes. Instead, using a broker discovery service is recommended. In
> > order to make PH work, you are forcing proxy to be tight with the
> > zookeeper.
> >
>
> This is not needed for all of the Proxy PH handlers.
> But Authorization & Authentication  are a core part of this story.
> If you implement your "smart proxy" somewhere else and not as a Plugin to
> the Pulsar Proxy (or Broker)
> you cannot leverage the same services, the same way.
> It leads to having more chances of having a behaviour different from
> standard Pulsar.
>
> PH developers are Pulsar experts, and you know that copy pasting code from
> Pulsar, leads to unpredictable behaviour
> when you run your plugin in another version of Pulsar.
> But if you use an API that is going to be maintained by Pulsar you are
> safer and you can think that your code is going to work.
>

The point here is that the proxy shouldn't care about Authorization and
Authentication.

The proxy should just forward the requests to the destination service.
Envoy or similar software has already provided such capabilities.

Why do you need to re-invent the solution here?


>
>
> >
> > 3) Configuring authentication and authorization in proxy is already
> > challenging. There are a few different combinations. A typical Pulsar
> setup
> > is to forward the authentication credentials to the brokers to
> authenticate
> > and authorize. If you don't do this correctly, it will introduce security
> > holes because a connection can potentially grab the superuser credential
> > configured in proxy and use superuser credentials to access brokers. From
> > this perspective, I think proxy protocol handler doesn't make things
> > simpler instead it makes things complicated when it comes to
> authentication
> > and authorization.
> >
>
> Yes, this is a very complex problem indeed.
>
> We can help developers by providing a standard framework to access these
> services.
>
> It is very important from my point of view, that we do not encourage
> developers to create
> their own versions of a Pulsar proxy.
>

I don't think people are creating new proxies. Kubernetes already has a
very successful toolchain for "proxies". They have already supported fair
amount of protocols. We should just use it instead of creating Pulsar's own
"proxies".


>
> My recent experience is that we can add many new wire protocols to Pulsar
> and this will help a lot with the adoption of Pulsar.
>
> As we are doing in many other places on Pulsar we should provide tools to
> write extensions
> and do not let people be too creative.
>
>
> >
> > I would like to see these questions are answered before moving to a vote.
> >
>
> I hope that we can reach consensus on the need of this API.
> because I see that there is a real need for making this happen.
>

To me, I think the solution can be achieved with existing tools. I don't
see a strong reason for us to re-invent the wheels again.


>
> It is the Pulsar momentum now, there are so many opportunities to reach out
> to users of other systems,
> let's not waste these opportunities.
>

Why not develop an Envoy filter for Pulsar and other message protocols?
This helps getting Pulsar exposed to a broader ecosystem.


>
>
> Enrico
>
>
>
> >
> > - Sijie
> >
> >
> >
> >
> > On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <[email protected]>
> > wrote:
> >
> > > Any other comment?
> > >
> > > I would like to start a VOTE, but I feel we saw too few comments here
> > >
> > > Please take a look.
> > > I believe it will be a good fit for 2.9.0 release, that is going to be
> > > released in the end of September
> > >
> > >
> > > Enrico
> > >
> > > Il Mar 31 Ago 2021, 18:14 Michael Marshall <[email protected]> ha
> > > scritto:
> > >
> > > > +1, just read through the PIP. Looks good to me.
> > > >
> > > > - Michael
> > > >
> > > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <[email protected]
> >
> > > > wrote:
> > > >
> > > > > Hello Pulsar fellows,
> > > > >
> > > > > I have prepared a PIP about adding support for Protocol Handlers
> > > > >
> > > > > This is the GDoc
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing
> > > > >
> > > > >
> > > > > This is the PR for the implementation
> > > > > https://github.com/apache/pulsar/pull/11838/files
> > > > >
> > > > > I am pretty sure that this PIP will make life of developers of
> > Protocol
> > > > > Handlers and of Administrators who deploy Protocol Handlers very
> > nicer
> > > > >
> > > > > We are still working on the formal PIP process, at the moment I am
> > > > sharing
> > > > > with you the document.
> > > > > My understanding is that after the discussion, I will start a VOTE
> > > > thread,
> > > > > and if the VOTE passes we can move forward with reviewing the PR,
> and
> > > > > hopefully merge this feature for Pulsar 2.9.0
> > > > >
> > > > > Enrico
> > > > >
> > > >
> > >
> >
>

Re: PIP-93 Pulsar Proxy Protocol Handlers

Reply via email to