Joe - Very comprehensive writeup!

> So my vote is not to allow this (and any other server side logic
implementations) into the base dispatcher, but permit these kinds of
changes as configurable dispatchers. I hope I have explained the reasons
for that vote clearly.

+1. We can do it in a way like what we did for protocol handlers.

Thanks,
Sijie

On Mon, Nov 16, 2020 at 10:40 AM Joe F <j...@apache.org> wrote:

> We have had discussions in the community list on server side logic
> previously. I would like to keep the specific proposal in this PIP aside,
> and address what this PIP is  implicitly changing in core Pulsar design.  I
> want to have an explicit discussion on that topic: what is the path for
> server-side business logic in Pulsar?
>
> Pulsar has been designed to do a few things very well.  It is designed to
> be run as a hosted service, meaning it can be scaled horizontally by adding
> storage or compute hardware, as traffic or tenants on the service grows. It
> is optimized for data streaming at  throughput and scale,  and does
> multi-tenancy extremely well.  Part of that design is that there is no
> business logic that is in the data flow path. Since  business logic lives
> outside of the core data flow path in Pulsar, the core is optimized for
> data flow. Do plain byte movement - no ser/de, no byte copy, no
> computations - and do it extremely well. Other systems, like Kafka and
> Kinesis have taken the same approach;  no to server side business logic.
>
> This particular PIP  may be  expensive on the server, or not. The next PIP
> could be, and there is no rationale to stop adding any kind of business
> logic into the broker, once this concept is allowed.
>
> Selective consumers are an anti-pattern for data flow systems. There are
> systems out there that support implementation of business logic in the data
> flow path, and they don't scale.   Take the example of AMQ.   AMQ allows
> JMS/SQL-92 expressions server side. Once the door to this anti-pattern  is
> opened, there is no rhyme or reason to deny anything, upto  including a
> full-blown SQL query evaluation in the dispatch path.
>
> So why not allow that? Why not allow a full blown expression evaluation in
> the data flow path?
>
> Unfortunately there  is no way to answer this without bringing up the
> conflict of interest between small users vs. large scale users running
> multi-tenant hosted Pulsar, at huge traffic volumes.
>
> For low scale, single (or few) tenant installations, efficiency of flow,
> latency and throughput are not the driving concern. In a small cluster,
> the implications of cost and scale, is minimal in absolute terms,  when
> server side business logic is executed.
>
> For large scale users (like me) this is a no go. There are many problems
> with this,  that makes it very difficult to run a hosted platform with
> predictable  SLAs, once users can introduce business logic into the broker.
> These are on top of the performance and cost  implications
>
> First, broker throughput and performance becomes unpredictable.  The
> current Pulsar load model (and it is used in the load manager for load
> balancing) becomes unusable. Not only that, there will be no pre-computed
> model that can be used in the load manager. Since  the producer and
> consumer randomly decide on what is the business logic,and the computation
> can change based on the data,  the model itself becomes dynamic and the
> load manager has to rebuild the model anytime an user updates the business
> logic. That is a tall order, worth years of work to implement.
>
> Second, this introduces the noisy neighbor issue. Two tenants will happily
> run on the same broker, till one of them decides to change the logic on the
> subscription, and suddenly the  quality for the other tenant is degraded
> because the broker is impacted.  The system operator of the cluster has now
> to get involved out of the blue, because one tenant did a change.
> Basically  any tenant can disrupt the system by triggering additional
> business logic in the server, or by specific data patterns that can make
> the business logic expensive on the server
>
> Third, this makes provisioning capacity impossible. Today Pulsar users can
> be provisioned on flow - bw in/out. Msgs in/out.  With server side business
> logic, there is some random overhead that needs to be accounted in the
> capacity calculation.
>
> We, who run Pulsar as a hosted service, do not want any of our tenants to
> introduce server side logic into the service.  Because,  to do it well
> requires a load balancer that can continuously and dynamically adjust its
> load model and capacity model (based on ML on the traffic maybe).  The
> scope of building such a system will convert Pulsar  from a  data streaming
> project  to a load balancer/resource manager  project. The only viable
> solution will be to give each tenant their own dedicated servers - at which
> point all claims to multi-tenancy in Pulsar  should be dropped.
>
>
> So large multi-tenant clusters will have big problems with the addition of
> business logic into the broker.
>
> But this problem - Pulsar users attempting to add server side logic into
> Pulsar - is not going to go away. There will always be yet another new user
> who will ask for adding ‘one more simple implementation' of server side
> business logic into the broker.
>
> My suggestion here is simple. Make the dispatcher a configurable module.
> Let users who want to do server side logic configure their own
> computational logic in custom dispatchers and   use it to their needs.
> Allow users  to implement custom dispatchers as a loadable module.  Users
> can then implement whatever logic they need to, without depending on
> Pulsar, and the code and module will remain in user-land rather than Pulsar
> land.  No one will be required to  contribute their dispatchers to Pulsar,
> but if there are specific dispatchers which can have widespread use, they
> can contribute it back into Pulsar (like connectors)
>
> If this seems suspiciously similar to functions, then yes, it is. Functions
> were meant to fulfill this need, but without messing with the dispatcher.
> Functions were meant to do business logic outside the hosted service, so
> that the service itself is not impacted by random users injecting business
> logic into the platform.
>
> But if functions are not acceptable, and users still want to mess with the
> dispatcher, what I am proposing is a way to let users  do that without
> breaking the design goals of Pulsar.  That will avoid  impacting the core
> data flow path,  for large system/ hosted service/multi-tenant use cases.
>
> So my vote is not to allow this (and any other server side logic
> implementations) into the base dispatcher, but permit these kinds of
> changes as configurable dispatchers. I hope I have explained the reasons
> for that vote clearly.
>
>
> Joe
>
>
> On Mon, Nov 16, 2020 at 10:03 AM Sijie Guo <guosi...@gmail.com> wrote:
>
> > Andre,
> >
> > I left a comment on the pull request. But I will just copy them here as
> > well.
> >
> > I have a couple of comments and one suggestion.
> >
> > 1. What is the performance & GC implication with this change? I think
> most
> > of the questions on this pull request is about the performance & GC
> > implication. It would be good to show your benchmarking/testing
> methodology
> > and the benchmark results to the community.
> >
> > 2. How are you going to handle topics with end-to-end encryption enabled?
> >
> > 3. How do you handle acknowledgment for the messages that have been
> > filtered out and never sent to the consumers? I don't see it is discussed
> > in the PIP. Especially, how is it related to different subscription
> types?
> >
> > One suggestion - If this PIP is approved, my recommendation is to use the
> > NAR classloader to load the class. You can check how Pulsar uses NAR
> > classloader for other interfaces.
> >
> > Thanks,
> > Sijie
> >
> > On Mon, Nov 16, 2020 at 2:53 AM Kramer, Andre <
> andre.kra...@softwareag.com
> > >
> > wrote:
> >
> > > Sure, please feel free to copy the doc to wiki pages. It's mainly text
> so
> > > can be converted easily.
> > >
> > > Cheers,
> > > Andre
> > >
> > > -----Original Message-----
> > > From: Sijie Guo <guosi...@gmail.com>
> > > Sent: 13 November 2020 19:08
> > > To: Dev <dev@pulsar.apache.org>
> > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers
> > >
> > > Andre,
> > >
> > > Is it possible to put it in a Google Doc (or similar collaboration
> tool)
> > > that allows other people to make comments? Also, it would be easier for
> > the
> > > committers to copy the PIP to Pulsar wiki pages.
> > >
> > > Thanks,
> > > Sijie
> > >
> > > On Fri, Nov 13, 2020 at 2:44 AM Kramer, Andre <
> > andre.kra...@softwareag.com
> > > >
> > > wrote:
> > >
> > > > Hi Sijie,
> > > >
> > > > I had added a PIP style document to the pull request:
> > > >
> https://github.com/andrekramer1/pulsar/blob/consumer-filter2-7-0/PIP-X
> > > > X%20-%20Consumer-filtering.pdf Hopefully that could be used to start
> > > > the discussion?
> > > >
> > > > Regards,
> > > > Andre
> > > >
> > > > -----Original Message-----
> > > > From: Sijie Guo <guosi...@gmail.com>
> > > > Sent: 12 November 2020 18:32
> > > > To: Dev <dev@pulsar.apache.org>
> > > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers
> > > >
> > > > Hi Andre,
> > > >
> > > > I didn't see the attached writeup. Can you write a PIP for this
> > feature?
> > > > Given it is a big feature, it would be good to discuss it through a
> > PIP.
> > > >
> > > > - Sijie
> > > >
> > > > On Thu, Nov 12, 2020 at 6:17 AM Kramer, Andre
> > > > <andre.kra...@softwareag.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > >
> > > > >
> > > > > We at Software AG have prototyped adding filtering on Consumer
> > > > > subscriptions in the Pulsar broker and are submitting our changes
> > > > > for consideration under Apache 2.0 license. Please see pull request
> > > > > [Consumer Filtering #8544
> > > > > https://github.com/apache/pulsar/pull/8544]
> > > > > and attached write up. Comments welcome!
> > > > >
> > > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Andre
> > > > >
> > > > >
> > > > >
> > > > > andre.kra...@softwareag.com
> > > > > This communication contains information which is confidential and
> > > > > may also be privileged. It is for the exclusive use of the intended
> > > > > recipient(s). If you are not the intended recipient(s), please note
> > > > > that any distribution, copying, or use of this communication or the
> > > > > information in it, is strictly prohibited. If you have received
> this
> > > > > communication in error please notify us by e-mail and then delete
> > > > > the
> > > > e-mail and any copies of it.
> > > > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > > > *http://www.softwareag.com/uk
> > > > > * <http://www.softwareag.com/uk>
> > > > >
> > > > This communication contains information which is confidential and may
> > > > also be privileged. It is for the exclusive use of the intended
> > > > recipient(s). If you are not the intended recipient(s), please note
> > > > that any distribution, copying, or use of this communication or the
> > > > information in it, is strictly prohibited. If you have received this
> > > > communication in error please notify us by e-mail and then delete the
> > > e-mail and any copies of it.
> > > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > > http://www.softwareag.com/uk
> > > >
> > > This communication contains information which is confidential and may
> > also
> > > be privileged. It is for the exclusive use of the intended
> recipient(s).
> > If
> > > you are not the intended recipient(s), please note that any
> distribution,
> > > copying, or use of this communication or the information in it, is
> > strictly
> > > prohibited. If you have received this communication in error please
> > notify
> > > us by e-mail and then delete the e-mail and any copies of it.
> > > Software AG (UK) Limited Registered in England & Wales 1310740 -
> > > http://www.softwareag.com/uk
> > >
> >
>

Reply via email to