Re: PIP-93 Pulsar Proxy Protocol Handlers

Enrico Olivelli Fri, 03 Sep 2021 05:07:44 -0700

Sijie,
Thanks for your questions, answers inline below.

Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <guosi...@gmail.com> ha
scritto:

> I would like to see the clarification between the broker protocol handlers
> and proxy protocol handlers before moving it to a vote thread.
>

A PH in the broker is very useful as it allows you to directly access the
ManagedLedger and implement high performance adapters for
other wire protocols.
The bigger limitation is that you can access efficiently only the topics
owned by the local broker.
If you try to forward/proxy the request to another broker (you can do it,
and this was Matteo's suggestion at the latest Video Community meeting)
you have the downside that the broker has to waste resources to do the
"proxy work"
and you generally want a broker machine to be used only to deal with the
local traffic.

The load balancing mechanism of the brokers is not meant to deal with
additional work due to proxying requests related to the topics for which
the broker is not owner.

A PH in the proxy is useful to add new protocols that are running in front
of the whole cluster and not only of one single broker.
This is a very different use case in respect to having the PH in broker.

The work of the proxy usually is to forward requests to the internal
services of the cluster, and in case of new protocols in the proxy
you need some logic to fill in the gaps in the original wireprotocol.

System architects expect a different kind of load on the proxy and other
kinds of load on the brokers.
For instance you usually can run very few proxies to cover a big cluster
with many brokers.
So adding a PH on all the brokers is sometimes overkilling.

>
> I can see how it will cause confusion for protocol developers.
>

Protocol developers are very advanced users that do need to understand
clearly the internals of Pulsar.
In fact this request of having PHs in the Proxy layer came from myself and
from other colleagues of mine who are working heavily in implementing
new protocol handlers in Pulsar.

And we faced the limitation of the need to create a new proxy service for
each new protocol, but all of these "proxy services" have in common
most of the features of the Pulsar proxy.
When we also came to deal with System Architects it was clear the
requirement to have only one single "place" to put all of the interactions
at "cluster level" with Pulsar.

I think this is a good picture of what I mean:
- PH in the Broker -> add protocols inside the Broker, work for owned topics
- PH in the Proxy -> add protocols in front of the whole Cluster

> Yunze brought a good idea on KoP.

I also have good ideas and working solutions for a Pulsar-proxy like KOP
Proxy.
I will be happy to discuss this in a separate thread or at a separate table
with Yunze.

A smart KOP proxy can work if you run inside the Pulsar proxy process or
you can copy/paste the Pulsar Proxy code and create another service.

> But I don't think that's the right
> direction. If you can give an example of the usage of a proxy handler and
> how it is different from using a broker handler, that would help me
> understand this PIP.
>

For some protocols you have to execute some non trivial work for mapping
the wireprotocol and the concepts of the protocol to the Pulsar model.
For instance some protocols do not have the concept of "lookup", and the
proxy does the lookup and forwards the request to the internal broker.

For some protocols you can just use the PulsarClient to connect to the
internal brokers, you do not need and you do not want to access the
ManagedLedgers:
in this case adding the execution inside the broker is only complicating
the overall design of the system and putting load on the brokers.

There is a good amount of processing that should be executed on the proxy,
and it is not good to run it on a broker.
If you do not put the "custom code" in the Proxy and you can only write a
Broker PH you end up in adding it to the Broker.

If you expose directly (with some LoadBalancer or whatever) your brokers in
which you run the PH code that you would put in the proxy
you end up in putting on the broker some load that is not expected:
- the broker will have to work even for topics for which it is not the owner
- the broker will have to do things that cannot be dealt correctly by the
Pulsar load balancer (because it expects that the load it proportional to
the owned bundles)

>
> The reason why Pulsar proxy is built is to have a "smart" proxy that is
> aware of Pulsar protocol. The Pulsar proxy can be replaced with other
> mature proxy software with SNI routing or multiple advertised listeners
> now. Hence I am afraid that we are taking the wrong direction here. Here
> are various reasons.
>
> 1) The ProxyService is essentially a Pulsar admin client. Broker service
> also provides a Pulsar admin client. I am not sure how Proxy PH will
> simplify the protocol handler development. Please use an example to
> demonstrate it.
>

In the cases I am highlighting, *the Broker is simply not the right place
to run the code*.

So the problem here is not to have PulsarAdmin in the Broker on in the
Proxy.
Is that if you want to write a smart proxy for another protocol:
- you end up in copy/pasting the Proxy code
- you use the internal Pulsar classes to have a consistent behaviour with
the Pulsar Proxy
- you add more components to the "picture" of the Pulsar cluster

> 2) The Authorization & Authentication services in ProxyService are only
> used when proxies are configured to use zookeeper for broker discovery.
> However, this option is not recommended when running Pulsar proxies in
> Kubernetes. Instead, using a broker discovery service is recommended. In
> order to make PH work, you are forcing proxy to be tight with the
> zookeeper.
>

This is not needed for all of the Proxy PH handlers.
But Authorization & Authentication  are a core part of this story.
If you implement your "smart proxy" somewhere else and not as a Plugin to
the Pulsar Proxy (or Broker)
you cannot leverage the same services, the same way.
It leads to having more chances of having a behaviour different from
standard Pulsar.

PH developers are Pulsar experts, and you know that copy pasting code from
Pulsar, leads to unpredictable behaviour
when you run your plugin in another version of Pulsar.
But if you use an API that is going to be maintained by Pulsar you are
safer and you can think that your code is going to work.

>
> 3) Configuring authentication and authorization in proxy is already
> challenging. There are a few different combinations. A typical Pulsar setup
> is to forward the authentication credentials to the brokers to authenticate
> and authorize. If you don't do this correctly, it will introduce security
> holes because a connection can potentially grab the superuser credential
> configured in proxy and use superuser credentials to access brokers. From
> this perspective, I think proxy protocol handler doesn't make things
> simpler instead it makes things complicated when it comes to authentication
> and authorization.
>

Yes, this is a very complex problem indeed.

We can help developers by providing a standard framework to access these
services.

It is very important from my point of view, that we do not encourage
developers to create
their own versions of a Pulsar proxy.

My recent experience is that we can add many new wire protocols to Pulsar
and this will help a lot with the adoption of Pulsar.

As we are doing in many other places on Pulsar we should provide tools to
write extensions
and do not let people be too creative.

>
> I would like to see these questions are answered before moving to a vote.
>

I hope that we can reach consensus on the need of this API.
because I see that there is a real need for making this happen.

It is the Pulsar momentum now, there are so many opportunities to reach out
to users of other systems,
let's not waste these opportunities.

Enrico

>
> - Sijie
>
>
>
>
> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Any other comment?
> >
> > I would like to start a VOTE, but I feel we saw too few comments here
> >
> > Please take a look.
> > I believe it will be a good fit for 2.9.0 release, that is going to be
> > released in the end of September
> >
> >
> > Enrico
> >
> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <mikemars...@gmail.com> ha
> > scritto:
> >
> > > +1, just read through the PIP. Looks good to me.
> > >
> > > - Michael
> > >
> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <eolive...@gmail.com>
> > > wrote:
> > >
> > > > Hello Pulsar fellows,
> > > >
> > > > I have prepared a PIP about adding support for Protocol Handlers
> > > >
> > > > This is the GDoc
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing
> > > >
> > > >
> > > > This is the PR for the implementation
> > > > https://github.com/apache/pulsar/pull/11838/files
> > > >
> > > > I am pretty sure that this PIP will make life of developers of
> Protocol
> > > > Handlers and of Administrators who deploy Protocol Handlers very
> nicer
> > > >
> > > > We are still working on the formal PIP process, at the moment I am
> > > sharing
> > > > with you the document.
> > > > My understanding is that after the discussion, I will start a VOTE
> > > thread,
> > > > and if the VOTE passes we can move forward with reviewing the PR, and
> > > > hopefully merge this feature for Pulsar 2.9.0
> > > >
> > > > Enrico
> > > >
> > >
> >
>

Re: PIP-93 Pulsar Proxy Protocol Handlers

Reply via email to