Re: PIP-93 Pulsar Proxy Protocol Handlers

Sijie Guo Thu, 16 Sep 2021 23:18:10 -0700

> I totally understand this point. I wasn't there when the proxy was born
but
currently
my experience is that the Proxy is perceived as the primary endpoint in
front of the Pulsar cluster
especially when you run in k8s.


The Pulsar Proxy was born because there is no great solution at that point.
However, the Kubernetes stack has evolved beyond what it was before. So
does Pulsar evolve.

For example,
https://github.com/apache/pulsar/wiki/PIP-60%3A-Support-Proxy-server-with-SNI-routing
is introduced to use other mature proxy softwares with SNI routing.

Multiple broker listeners have been introduced to allow better integrations
with proxy and service mesh solutions. Hence I don't think "proxy" is the
primary endpoint in front of a Pulsar cluster anymore.

Hence I don't think proxy PH is the right solution for the problems you are
trying to solve. I would avoid introducing PH to proxy.

- Sijie

On Tue, Sep 14, 2021 at 8:02 AM Enrico Olivelli <eolive...@gmail.com> wrote:

> other comments ?
>
> Enrico
>
> Il giorno gio 9 set 2021 alle ore 09:15 Enrico Olivelli <
> eolive...@gmail.com>
> ha scritto:
>
> > Joe,
> >
> > Il giorno gio 9 set 2021 alle ore 04:31 Joe F <joefranc...@gmail.com> ha
> > scritto:
> >
> >> Enrico, my initial comment  when you brought up PH was in relation to
> the
> >> larger question about proxying, rather than looking at this in a limited
> >> fashion on how to  make it easy to add new PH in the proxy.
> >>
> >> But specifically with this, here are my comments. Two very
> >> distinct abstractions are being mixed up here, and I'm not sure
> >> whether that is a good idea or not.
> >>
> >
> > One way of seeing this PIP is to simply complete the work initiated with
> > PIP-41 (Introduction of Broker PHs,
> >
> https://github.com/apache/pulsar/wiki/PIP-41%3A-Pluggable-Protocol-Handler
> > ).
> >
> >
> >
> >>
> >> The proxy was designed to move bits and bytes without interpretation,
> >> from
> >> one network to the another.  The issue with Pulsar  is that  it requires
> >> some interpretation of the data to find to which server  a client should
> >> connect. .  Protocol translation crept into the proxy, just to be able
> to
> >> ask this question. Since auth is required to answer this question,  auth
> >> also crept in.    Essentially the proxy was built as a TCP proxy, not
> as a
> >> wire protocol translator.   Some additional hacky things needed to be
> done
> >> to make it work as a TCP proxy,  and in my opinion those things  should
> >> die away to the fullest extent possible
> >>
> >
> > I totally understand this point. I wasn't there when the proxy was born
> > but currently
> > my experience is that the Proxy is perceived as the primary endpoint in
> > front of the Pulsar cluster
> > especially when you run in k8s.
> >
> >
> >
> >>
> >> Because of all this, the current implementation is not ideal.  It's
> usage
> >> is highly restricted in actual deployments, because of potential
> security
> >> risks if the proxy is  misconfigured. One needs to be strict about
> setting
> >> up the proxy  to meet security standards in highly regulated
> environments.
> >>
> >>
> >>
> >> >And we faced the limitation of the need to create a new proxy service
> for
> >> >each new protocol, but all of these "proxy services" have in common
> >> >most of the features of the Pulsar proxy.
> >> >When we also came to deal with System Architects it was clear the
> >> >requirement to have only one single "place" to put all of the
> >> interactions
> >> >at "cluster level" with Pulsar.
> >>
> >> Good idea, a single place seems right. Can the proxy answer the traffic
> >> routing question without interpreting the data? Essentially, move what
> is
> >> done within the proxy now,  to a well known service within the cluster,
> >> and
> >> use that ?
> >>
> >
> > In the usecases I know, simply routing PDUs to internal brokers is not
> > enough
> > but you often need to add complex mapping logic from the External
> Protocol
> > Concepts to Pulsar concepts on the Proxy component.
> >
> > So you have two ways:
> > 1. create your own service and deploy it separately: this was the
> > beginning of my work and the same did some colleagues of mine
> > 2. deploy your code inside the Pulsar Proxy, and leverage current
> > packaging, configuration, tools, security APIs, helm chart.....
> >
> > I started this discussion because I found option 1 very awkward for Proxy
> > Component developers, for System Administrators and for System
> Architects.
> >
> > Developers:
> > - you have to copy/paste some Pulsar Proxy code, import Proxy jars, use
> > internal Pulsar classes to implement Authentication, Authorization,
> Service
> > Discovery., Configuration...
> >
> > System Administrators:
> > - you have a new set of configuration files and tools to manage the
> > settings (and in k8s you have to modify the Helm Chart significantly)
> >
> > System Architects:
> > - you have multiple new components in the pictures, to explain, to
> > justify.....
> >
> > With this proposal:
> >
> > Developers:
> > - use a framework, do not reinvent the wheel, be able to ensure that you
> > are compatible with a give Pulsar version, ensure that the behaviour is
> > consistent with other Pulsar components (like using ProxyConfiguration,
> or
> > the same service lifecycle, same libs) you can evolve more easily
> >
> > System Administrator:
> > - you use proxy.conf/broker.conf, you use Pulsar CLI tools, no need to
> > change the Helm Charts
> >
> > System Architects:
> > - nothing new in the table, every Pulsar docs applies, you have the Proxy
> > that deals with external clients, but it is able to speak Pulsar, Kafka,
> > RabbitMQ, MQTT, ActiveMQ
> >
> >
> >
> >
> >>
> >> >I think this is a good picture of what I mean:
> >> >- PH in the Broker -> add protocols inside the Broker, work for owned
> >> topics
> >> >- PH in the Proxy -> add protocols in front of the whole Cluster
> >> >There is a good amount of processing that should be executed on the
> >> proxy,
> >> >and it is not good to run it on a broker.
> >>
> >>  Is a TCP proxy a good place to do wire protocol translation
> >> (computation)?
> >> Especially if that translation is a good amount of processing?  if it's
> >> not
> >> good to run this much processing on the broker, then it's even worse to
> >> run
> >> it on a network proxy. I can foresee this as a path that will lead to
> >> cluster and load management creeping into the proxy, as soon as you move
> >> beyond what a single proxy can handle.
> >>
> >> But I think these issues (of n/w vs protocol translation) are moot when
> >> you
> >> look at the larger needs of  generic proxy that will support ingress,
> >> configurable protocol handlers, load balancing etc for use with Pulsar.
> >> You
> >> can run a bunch of Pulsar's  proxies today, and there is no means to
> >> manage
> >> them properly. eg: load balance between them/ manage them as a cluster/
> >> have affinity of proxies to topics/tenants. etc. This applies even
> before
> >> this PIP (and more so once you add more processing into the proxy).
> >>
> >> The Pulsar proxy, as it is,  is not amenable to creating anything like a
> >> service mesh. It would demand a lot of work in the proxy. Hence my
> >> initial comment about the proxy eventually becoming a mudball, and why
> we
> >> should rethink this entire proxy.
> >>
> >>  It is tempting to evolve the Pulsar proxy into a service that supports
> >> everything.. ingress, transformation chains, cluster management  etc .
> >> This  will eventually end up  duplicating something which already exists
> >> elsewhere.  My take is that this is better done by building on top of
> >> something like envoy ( or similar) which has built in and mature
> >> features,
> >> and supported by a wide user base.
> >>
> >
> > Unfortunately general purpose proxies or proxies specific to some
> protocol
> > will not be able to
> > do efficiently what we can do using Pulsar APIs, because they cannot
> "map"
> > directly External Concepts to the Pulsar model.
> >
> > I cannot imagine the cost of developing and maintaining a plugin for
> Envoy
> > that is able to deal
> > with Pulsar concepts. For instance it is not written in Java and you
> > cannot use Java Bindings for Pulsar, that are feature complete and always
> > up-to-date with latest features.
> > Also developers that work on PHs are specialized in Pulsar code and in
> > Java (at very high levels), and so for them it is harder to write super
> > efficient and high quality plugins using non-Java languages.
> >
> > So I see a huge value in adding this ability to the Pulsar Proxy.
> >
> > The only alternative to this PIP is to create a new framework for
> creating
> > such "Smart Proxies" in Java and using some official/maintained Pulsar
> API.
> >
> > So we will end up discussing the value of adding such a brand new module,
> > and how to deploy/manage it.
> >
> > It is a huge cost and it will take so much time:
> > - design,
> > - adding new concepts to the architecture,
> > - adding a new service (new management tools),
> > - lot of new code (probably cut/paste from Pulsar Proxy)
> > - helm chart
> > - new configuration files
> > - docs
> >
> > I believe that we should spend our time in adding more bindings/protocol
> > handlers instead of doing that.
> >
> > By the way I will be happy to drive this new effort if this is REALLY
> what
> > we want.
> >
> > So I am convinced that for the short/mid term this PIP is the best choice
> > to help Pulsar adoption.
> >
> > This PIP will unlock some great potential that otherwise will be
> > available only to users of custom tools, not officially maintained
> > inside the Pulsar project.
> > I will be very sad about the outcome
> >
> >
> >
> > Enrico
> >
> >
> >
> >>
> >> -j
> >>
> >> On Tue, Sep 7, 2021 at 11:11 PM Enrico Olivelli <eolive...@gmail.com>
> >> wrote:
> >>
> >> > (ping)
> >> >
> >> >
> >> > Il giorno ven 3 set 2021 alle ore 14:06 Enrico Olivelli <
> >> > eolive...@gmail.com>
> >> > ha scritto:
> >> >
> >> > > Sijie,
> >> > > Thanks for your questions, answers inline below.
> >> > >
> >> > > Il giorno gio 2 set 2021 alle ore 02:23 Sijie Guo <
> guosi...@gmail.com
> >> >
> >> > ha
> >> > > scritto:
> >> > >
> >> > >> I would like to see the clarification between the broker protocol
> >> > handlers
> >> > >> and proxy protocol handlers before moving it to a vote thread.
> >> > >>
> >> > >
> >> > > A PH in the broker is very useful as it allows you to directly
> access
> >> the
> >> > > ManagedLedger and implement high performance adapters for
> >> > > other wire protocols.
> >> > > The bigger limitation is that you can access efficiently only the
> >> topics
> >> > > owned by the local broker.
> >> > > If you try to forward/proxy the request to another broker (you can
> do
> >> it,
> >> > > and this was Matteo's suggestion at the latest Video Community
> >> meeting)
> >> > > you have the downside that the broker has to waste resources to do
> the
> >> > > "proxy work"
> >> > > and you generally want a broker machine to be used only to deal with
> >> the
> >> > > local traffic.
> >> > >
> >> > > The load balancing mechanism of the brokers is not meant to deal
> with
> >> > > additional work due to proxying requests related to the topics for
> >> which
> >> > > the broker is not owner.
> >> > >
> >> > > A PH in the proxy is useful to add new protocols that are running in
> >> > front
> >> > > of the whole cluster and not only of one single broker.
> >> > > This is a very different use case in respect to having the PH in
> >> broker.
> >> > >
> >> > > The work of the proxy usually is to forward requests to the internal
> >> > > services of the cluster, and in case of new protocols in the proxy
> >> > > you need some logic to fill in the gaps in the original
> wireprotocol.
> >> > >
> >> > > System architects expect a different kind of load on the proxy and
> >> other
> >> > > kinds of load on the brokers.
> >> > > For instance you usually can run very few proxies to cover a big
> >> cluster
> >> > > with many brokers.
> >> > > So adding a PH on all the brokers is sometimes overkilling.
> >> > >
> >> > >
> >> > >>
> >> > >> I can see how it will cause confusion for protocol developers.
> >> > >>
> >> > >
> >> > > Protocol developers are very advanced users that do need to
> understand
> >> > > clearly the internals of Pulsar.
> >> > > In fact this request of having PHs in the Proxy layer came from
> myself
> >> > and
> >> > > from other colleagues of mine who are working heavily in
> implementing
> >> > > new protocol handlers in Pulsar.
> >> > >
> >> > > And we faced the limitation of the need to create a new proxy
> service
> >> for
> >> > > each new protocol, but all of these "proxy services" have in common
> >> > > most of the features of the Pulsar proxy.
> >> > > When we also came to deal with System Architects it was clear the
> >> > > requirement to have only one single "place" to put all of the
> >> > interactions
> >> > > at "cluster level" with Pulsar.
> >> > >
> >> > > I think this is a good picture of what I mean:
> >> > > - PH in the Broker -> add protocols inside the Broker, work for
> owned
> >> > > topics
> >> > > - PH in the Proxy -> add protocols in front of the whole Cluster
> >> > >
> >> > >
> >> > >> Yunze brought a good idea on KoP.
> >> > >
> >> > >
> >> > > I also have good ideas and working solutions for a Pulsar-proxy like
> >> KOP
> >> > > Proxy.
> >> > > I will be happy to discuss this in a separate thread or at a
> separate
> >> > > table with Yunze.
> >> > >
> >> > > A smart KOP proxy can work if you run inside the Pulsar proxy
> process
> >> or
> >> > > you can copy/paste the Pulsar Proxy code and create another service.
> >> > >
> >> > >
> >> > >> But I don't think that's the right
> >> > >> direction. If you can give an example of the usage of a proxy
> handler
> >> > and
> >> > >> how it is different from using a broker handler, that would help me
> >> > >> understand this PIP.
> >> > >>
> >> > >
> >> > > For some protocols you have to execute some non trivial work for
> >> mapping
> >> > > the wireprotocol and the concepts of the protocol to the Pulsar
> model.
> >> > > For instance some protocols do not have the concept of "lookup", and
> >> the
> >> > > proxy does the lookup and forwards the request to the internal
> broker.
> >> > >
> >> > > For some protocols you can just use the PulsarClient to connect to
> the
> >> > > internal brokers, you do not need and you do not want to access the
> >> > > ManagedLedgers:
> >> > > in this case adding the execution inside the broker is only
> >> complicating
> >> > > the overall design of the system and putting load on the brokers.
> >> > >
> >> > > There is a good amount of processing that should be executed on the
> >> > proxy,
> >> > > and it is not good to run it on a broker.
> >> > > If you do not put the "custom code" in the Proxy and you can only
> >> write a
> >> > > Broker PH you end up in adding it to the Broker.
> >> > >
> >> > > If you expose directly (with some LoadBalancer or whatever) your
> >> brokers
> >> > > in which you run the PH code that you would put in the proxy
> >> > > you end up in putting on the broker some load that is not expected:
> >> > > - the broker will have to work even for topics for which it is not
> the
> >> > > owner
> >> > > - the broker will have to do things that cannot be dealt correctly
> by
> >> the
> >> > > Pulsar load balancer (because it expects that the load it
> >> proportional to
> >> > > the owned bundles)
> >> > >
> >> > >
> >> > >>
> >> > >> The reason why Pulsar proxy is built is to have a "smart" proxy
> that
> >> is
> >> > >> aware of Pulsar protocol. The Pulsar proxy can be replaced with
> other
> >> > >> mature proxy software with SNI routing or multiple advertised
> >> listeners
> >> > >> now. Hence I am afraid that we are taking the wrong direction here.
> >> Here
> >> > >> are various reasons.
> >> > >>
> >> > >> 1) The ProxyService is essentially a Pulsar admin client. Broker
> >> service
> >> > >> also provides a Pulsar admin client. I am not sure how Proxy PH
> will
> >> > >> simplify the protocol handler development. Please use an example to
> >> > >> demonstrate it.
> >> > >>
> >> > >
> >> > > In the cases I am highlighting, *the Broker is simply not the right
> >> place
> >> > > to run the code*.
> >> > >
> >> > > So the problem here is not to have PulsarAdmin in the Broker on in
> the
> >> > > Proxy.
> >> > > Is that if you want to write a smart proxy for another protocol:
> >> > > - you end up in copy/pasting the Proxy code
> >> > > - you use the internal Pulsar classes to have a consistent behaviour
> >> with
> >> > > the Pulsar Proxy
> >> > > - you add more components to the "picture" of the Pulsar cluster
> >> > >
> >> > >
> >> > >> 2) The Authorization & Authentication services in ProxyService are
> >> only
> >> > >> used when proxies are configured to use zookeeper for broker
> >> discovery.
> >> > >> However, this option is not recommended when running Pulsar proxies
> >> in
> >> > >> Kubernetes. Instead, using a broker discovery service is
> >> recommended. In
> >> > >> order to make PH work, you are forcing proxy to be tight with the
> >> > >> zookeeper.
> >> > >>
> >> > >
> >> > > This is not needed for all of the Proxy PH handlers.
> >> > > But Authorization & Authentication  are a core part of this story.
> >> > > If you implement your "smart proxy" somewhere else and not as a
> >> Plugin to
> >> > > the Pulsar Proxy (or Broker)
> >> > > you cannot leverage the same services, the same way.
> >> > > It leads to having more chances of having a behaviour different from
> >> > > standard Pulsar.
> >> > >
> >> > > PH developers are Pulsar experts, and you know that copy pasting
> code
> >> > from
> >> > > Pulsar, leads to unpredictable behaviour
> >> > > when you run your plugin in another version of Pulsar.
> >> > > But if you use an API that is going to be maintained by Pulsar you
> are
> >> > > safer and you can think that your code is going to work.
> >> > >
> >> > >
> >> > >>
> >> > >> 3) Configuring authentication and authorization in proxy is already
> >> > >> challenging. There are a few different combinations. A typical
> Pulsar
> >> > >> setup
> >> > >> is to forward the authentication credentials to the brokers to
> >> > >> authenticate
> >> > >> and authorize. If you don't do this correctly, it will introduce
> >> > security
> >> > >> holes because a connection can potentially grab the superuser
> >> credential
> >> > >> configured in proxy and use superuser credentials to access
> brokers.
> >> > From
> >> > >> this perspective, I think proxy protocol handler doesn't make
> things
> >> > >> simpler instead it makes things complicated when it comes to
> >> > >> authentication
> >> > >> and authorization.
> >> > >>
> >> > >
> >> > > Yes, this is a very complex problem indeed.
> >> > >
> >> > > We can help developers by providing a standard framework to access
> >> these
> >> > > services.
> >> > >
> >> > > It is very important from my point of view, that we do not encourage
> >> > > developers to create
> >> > > their own versions of a Pulsar proxy.
> >> > >
> >> > > My recent experience is that we can add many new wire protocols to
> >> Pulsar
> >> > > and this will help a lot with the adoption of Pulsar.
> >> > >
> >> > > As we are doing in many other places on Pulsar we should provide
> >> tools to
> >> > > write extensions
> >> > > and do not let people be too creative.
> >> > >
> >> > >
> >> > >>
> >> > >> I would like to see these questions are answered before moving to a
> >> > vote.
> >> > >>
> >> > >
> >> > > I hope that we can reach consensus on the need of this API.
> >> > > because I see that there is a real need for making this happen.
> >> > >
> >> > > It is the Pulsar momentum now, there are so many opportunities to
> >> reach
> >> > > out to users of other systems,
> >> > > let's not waste these opportunities.
> >> > >
> >> > >
> >> > > Enrico
> >> > >
> >> > >
> >> > >
> >> > >>
> >> > >> - Sijie
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Wed, Sep 1, 2021 at 12:55 PM Enrico Olivelli <
> eolive...@gmail.com
> >> >
> >> > >> wrote:
> >> > >>
> >> > >> > Any other comment?
> >> > >> >
> >> > >> > I would like to start a VOTE, but I feel we saw too few comments
> >> here
> >> > >> >
> >> > >> > Please take a look.
> >> > >> > I believe it will be a good fit for 2.9.0 release, that is going
> >> to be
> >> > >> > released in the end of September
> >> > >> >
> >> > >> >
> >> > >> > Enrico
> >> > >> >
> >> > >> > Il Mar 31 Ago 2021, 18:14 Michael Marshall <
> mikemars...@gmail.com>
> >> ha
> >> > >> > scritto:
> >> > >> >
> >> > >> > > +1, just read through the PIP. Looks good to me.
> >> > >> > >
> >> > >> > > - Michael
> >> > >> > >
> >> > >> > > On Mon, Aug 30, 2021 at 3:47 AM Enrico Olivelli <
> >> > eolive...@gmail.com>
> >> > >> > > wrote:
> >> > >> > >
> >> > >> > > > Hello Pulsar fellows,
> >> > >> > > >
> >> > >> > > > I have prepared a PIP about adding support for Protocol
> >> Handlers
> >> > >> > > >
> >> > >> > > > This is the GDoc
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> https://docs.google.com/document/d/1Hlc_BOpQTkWX8FgrvWSfk6h5xTQKMXnTcSuil0Nznrg/edit?usp=sharing
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > This is the PR for the implementation
> >> > >> > > > https://github.com/apache/pulsar/pull/11838/files
> >> > >> > > >
> >> > >> > > > I am pretty sure that this PIP will make life of developers
> of
> >> > >> Protocol
> >> > >> > > > Handlers and of Administrators who deploy Protocol Handlers
> >> very
> >> > >> nicer
> >> > >> > > >
> >> > >> > > > We are still working on the formal PIP process, at the moment
> >> I am
> >> > >> > > sharing
> >> > >> > > > with you the document.
> >> > >> > > > My understanding is that after the discussion, I will start a
> >> VOTE
> >> > >> > > thread,
> >> > >> > > > and if the VOTE passes we can move forward with reviewing the
> >> PR,
> >> > >> and
> >> > >> > > > hopefully merge this feature for Pulsar 2.9.0
> >> > >> > > >
> >> > >> > > > Enrico
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: PIP-93 Pulsar Proxy Protocol Handlers

Reply via email to