Re: PIP-93 Pulsar Proxy Protocol Handlers

Enrico Olivelli Tue, 07 Sep 2021 23:11:11 -0700

(ping)


Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli <eolive...@gmail.com>
ha scritto:

> Sijie,
> Thanks for your questions, answers inline below.
>
> Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <guosi...@gmail.com> ha
> scritto:
>
>> I would like to see the clarification between the broker protocol handlers
>> and proxy protocol handlers before moving it to a vote thread.
>>
>
> A PH in the broker is very useful as it allows you to directly access the
> ManagedLedger and implement high performance adapters for
> other wire protocols.
> The bigger limitation is that you can access efficiently only the topics
> owned by the local broker.
> If you try to forward/proxy the request to another broker (you can do it,
> and this was Matteo's suggestion at the latest Video Community meeting)
> you have the downside that the broker has to waste resources to do the
> "proxy work"
> and you generally want a broker machine to be used only to deal with the
> local traffic.
>
> The load balancing mechanism of the brokers is not meant to deal with
> additional work due to proxying requests related to the topics for which
> the broker is not owner.
>
> A PH in the proxy is useful to add new protocols that are running in front
> of the whole cluster and not only of one single broker.
> This is a very different use case in respect to having the PH in broker.
>
> The work of the proxy usually is to forward requests to the internal
> services of the cluster, and in case of new protocols in the proxy
> you need some logic to fill in the gaps in the original wireprotocol.
>
> System architects expect a different kind of load on the proxy and other
> kinds of load on the brokers.
> For instance you usually can run very few proxies to cover a big cluster
> with many brokers.
> So adding a PH on all the brokers is sometimes overkilling.
>
>
>>
>> I can see how it will cause confusion for protocol developers.
>>
>
> Protocol developers are very advanced users that do need to understand
> clearly the internals of Pulsar.
> In fact this request of having PHs in the Proxy layer came from myself and
> from other colleagues of mine who are working heavily in implementing
> new protocol handlers in Pulsar.
>
> And we faced the limitation of the need to create a new proxy service for
> each new protocol, but all of these "proxy services" have in common
> most of the features of the Pulsar proxy.
> When we also came to deal with System Architects it was clear the
> requirement to have only one single "place" to put all of the interactions
> at "cluster level" with Pulsar.
>
> I think this is a good picture of what I mean:
> - PH in the Broker -> add protocols inside the Broker, work for owned
> topics
> - PH in the Proxy -> add protocols in front of the whole Cluster
>
>
>> Yunze brought a good idea on KoP.
>
>
> I also have good ideas and working solutions for a Pulsar-proxy like KOP
> Proxy.
> I will be happy to discuss this in a separate thread or at a separate
> table with Yunze.
>
> A smart KOP proxy can work if you run inside the Pulsar proxy process or
> you can copy/paste the Pulsar Proxy code and create another service.
>
>
>> But I don't think that's the right
>> direction. If you can give an example of the usage of a proxy handler and
>> how it is different from using a broker handler, that would help me
>> understand this PIP.
>>
>
> For some protocols you have to execute some non trivial work for mapping
> the wireprotocol and the concepts of the protocol to the Pulsar model.
> For instance some protocols do not have the concept of "lookup", and the
> proxy does the lookup and forwards the request to the internal broker.
>
> For some protocols you can just use the PulsarClient to connect to the
> internal brokers, you do not need and you do not want to access the
> ManagedLedgers:
> in this case adding the execution inside the broker is only complicating
> the overall design of the system and putting load on the brokers.
>
> There is a good amount of processing that should be executed on the proxy,
> and it is not good to run it on a broker.
> If you do not put the "custom code" in the Proxy and you can only write a
> Broker PH you end up in adding it to the Broker.
>
> If you expose directly (with some LoadBalancer or whatever) your brokers
> in which you run the PH code that you would put in the proxy
> you end up in putting on the broker some load that is not expected:
> - the broker will have to work even for topics for which it is not the
> owner
> - the broker will have to do things that cannot be dealt correctly by the
> Pulsar load balancer (because it expects that the load it proportional to
> the owned bundles)
>
>
>>
>> The reason why Pulsar proxy is built is to have a "smart" proxy that is
>> aware of Pulsar protocol. The Pulsar proxy can be replaced with other
>> mature proxy software with SNI routing or multiple advertised listeners
>> now. Hence I am afraid that we are taking the wrong direction here. Here
>> are various reasons.
>>
>> 1) The ProxyService is essentially a Pulsar admin client. Broker service
>> also provides a Pulsar admin client. I am not sure how Proxy PH will
>> simplify the protocol handler development. Please use an example to
>> demonstrate it.
>>
>
> In the cases I am highlighting, *the Broker is simply not the right place
> to run the code*.
>
> So the problem here is not to have PulsarAdmin in the Broker on in the
> Proxy.
> Is that if you want to write a smart proxy for another protocol:
> - you end up in copy/pasting the Proxy code
> - you use the internal Pulsar classes to have a consistent behaviour with
> the Pulsar Proxy
> - you add more components to the "picture" of the Pulsar cluster
>
>
>> 2) The Authorization & Authentication services in ProxyService are only
>> used when proxies are configured to use zookeeper for broker discovery.
>> However, this option is not recommended when running Pulsar proxies in
>> Kubernetes. Instead, using a broker discovery service is recommended. In
>> order to make PH work, you are forcing proxy to be tight with the
>> zookeeper.
>>
>
> This is not needed for all of the Proxy PH handlers.
> But Authorization & Authentication  are a core part of this story.
> If you implement your "smart proxy" somewhere else and not as a Plugin to
> the Pulsar Proxy (or Broker)
> you cannot leverage the same services, the same way.
> It leads to having more chances of having a behaviour different from
> standard Pulsar.
>
> PH developers are Pulsar experts, and you know that copy pasting code from
> Pulsar, leads to unpredictable behaviour
> when you run your plugin in another version of Pulsar.
> But if you use an API that is going to be maintained by Pulsar you are
> safer and you can think that your code is going to work.
>
>
>>
>> 3) Configuring authentication and authorization in proxy is already
>> challenging. There are a few different combinations. A typical Pulsar
>> setup
>> is to forward the authentication credentials to the brokers to
>> authenticate
>> and authorize. If you don't do this correctly, it will introduce security
>> holes because a connection can potentially grab the superuser credential
>> configured in proxy and use superuser credentials to access brokers. From
>> this perspective, I think proxy protocol handler doesn't make things
>> simpler instead it makes things complicated when it comes to
>> authentication
>> and authorization.
>>
>
> Yes, this is a very complex problem indeed.
>
> We can help developers by providing a standard framework to access these
> services.
>
> It is very important from my point of view, that we do not encourage
> developers to create
> their own versions of a Pulsar proxy.
>
> My recent experience is that we can add many new wire protocols to Pulsar
> and this will help a lot with the adoption of Pulsar.
>
> As we are doing in many other places on Pulsar we should provide tools to
> write extensions
> and do not let people be too creative.
>
>
>>
>> I would like to see these questions are answered before moving to a vote.
>>
>
> I hope that we can reach consensus on the need of this API.
> because I see that there is a real need for making this happen.
>
> It is the Pulsar momentum now, there are so many opportunities to reach
> out to users of other systems,
> let's not waste these opportunities.
>
>
> Enrico
>
>
>
>>
>> - Sijie
>>
>>
>>
>>
>> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <eolive...@gmail.com>
>> wrote:
>>
>> > Any other comment?
>> >
>> > I would like to start a VOTE, but I feel we saw too few comments here
>> >
>> > Please take a look.
>> > I believe it will be a good fit for 2.9.0 release, that is going to be
>> > released in the end of September
>> >
>> >
>> > Enrico
>> >
>> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <mikemars...@gmail.com> ha
>> > scritto:
>> >
>> > > +1, just read through the PIP. Looks good to me.
>> > >
>> > > - Michael
>> > >
>> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <eolive...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hello Pulsar fellows,
>> > > >
>> > > > I have prepared a PIP about adding support for Protocol Handlers
>> > > >
>> > > > This is the GDoc
>> > > >
>> > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing
>> > > >
>> > > >
>> > > > This is the PR for the implementation
>> > > > https://github.com/apache/pulsar/pull/11838/files
>> > > >
>> > > > I am pretty sure that this PIP will make life of developers of
>> Protocol
>> > > > Handlers and of Administrators who deploy Protocol Handlers very
>> nicer
>> > > >
>> > > > We are still working on the formal PIP process, at the moment I am
>> > > sharing
>> > > > with you the document.
>> > > > My understanding is that after the discussion, I will start a VOTE
>> > > thread,
>> > > > and if the VOTE passes we can move forward with reviewing the PR,
>> and
>> > > > hopefully merge this feature for Pulsar 2.9.0
>> > > >
>> > > > Enrico
>> > > >
>> > >
>> >
>>
>

Re: PIP-93 Pulsar Proxy Protocol Handlers

Reply via email to