We have had discussions in the community list on server side logic previously. I would like to keep the specific proposal in this PIP aside, and address what this PIP is implicitly changing in core Pulsar design. I want to have an explicit discussion on that topic: what is the path for server-side business logic in Pulsar?
Pulsar has been designed to do a few things very well. It is designed to be run as a hosted service, meaning it can be scaled horizontally by adding storage or compute hardware, as traffic or tenants on the service grows. It is optimized for data streaming at throughput and scale, and does multi-tenancy extremely well. Part of that design is that there is no business logic that is in the data flow path. Since business logic lives outside of the core data flow path in Pulsar, the core is optimized for data flow. Do plain byte movement - no ser/de, no byte copy, no computations - and do it extremely well. Other systems, like Kafka and Kinesis have taken the same approach; no to server side business logic. This particular PIP may be expensive on the server, or not. The next PIP could be, and there is no rationale to stop adding any kind of business logic into the broker, once this concept is allowed. Selective consumers are an anti-pattern for data flow systems. There are systems out there that support implementation of business logic in the data flow path, and they don't scale. Take the example of AMQ. AMQ allows JMS/SQL-92 expressions server side. Once the door to this anti-pattern is opened, there is no rhyme or reason to deny anything, upto including a full-blown SQL query evaluation in the dispatch path. So why not allow that? Why not allow a full blown expression evaluation in the data flow path? Unfortunately there is no way to answer this without bringing up the conflict of interest between small users vs. large scale users running multi-tenant hosted Pulsar, at huge traffic volumes. For low scale, single (or few) tenant installations, efficiency of flow, latency and throughput are not the driving concern. In a small cluster, the implications of cost and scale, is minimal in absolute terms, when server side business logic is executed. For large scale users (like me) this is a no go. There are many problems with this, that makes it very difficult to run a hosted platform with predictable SLAs, once users can introduce business logic into the broker. These are on top of the performance and cost implications First, broker throughput and performance becomes unpredictable. The current Pulsar load model (and it is used in the load manager for load balancing) becomes unusable. Not only that, there will be no pre-computed model that can be used in the load manager. Since the producer and consumer randomly decide on what is the business logic,and the computation can change based on the data, the model itself becomes dynamic and the load manager has to rebuild the model anytime an user updates the business logic. That is a tall order, worth years of work to implement. Second, this introduces the noisy neighbor issue. Two tenants will happily run on the same broker, till one of them decides to change the logic on the subscription, and suddenly the quality for the other tenant is degraded because the broker is impacted. The system operator of the cluster has now to get involved out of the blue, because one tenant did a change. Basically any tenant can disrupt the system by triggering additional business logic in the server, or by specific data patterns that can make the business logic expensive on the server Third, this makes provisioning capacity impossible. Today Pulsar users can be provisioned on flow - bw in/out. Msgs in/out. With server side business logic, there is some random overhead that needs to be accounted in the capacity calculation. We, who run Pulsar as a hosted service, do not want any of our tenants to introduce server side logic into the service. Because, to do it well requires a load balancer that can continuously and dynamically adjust its load model and capacity model (based on ML on the traffic maybe). The scope of building such a system will convert Pulsar from a data streaming project to a load balancer/resource manager project. The only viable solution will be to give each tenant their own dedicated servers - at which point all claims to multi-tenancy in Pulsar should be dropped. So large multi-tenant clusters will have big problems with the addition of business logic into the broker. But this problem - Pulsar users attempting to add server side logic into Pulsar - is not going to go away. There will always be yet another new user who will ask for adding ‘one more simple implementation' of server side business logic into the broker. My suggestion here is simple. Make the dispatcher a configurable module. Let users who want to do server side logic configure their own computational logic in custom dispatchers and use it to their needs. Allow users to implement custom dispatchers as a loadable module. Users can then implement whatever logic they need to, without depending on Pulsar, and the code and module will remain in user-land rather than Pulsar land. No one will be required to contribute their dispatchers to Pulsar, but if there are specific dispatchers which can have widespread use, they can contribute it back into Pulsar (like connectors) If this seems suspiciously similar to functions, then yes, it is. Functions were meant to fulfill this need, but without messing with the dispatcher. Functions were meant to do business logic outside the hosted service, so that the service itself is not impacted by random users injecting business logic into the platform. But if functions are not acceptable, and users still want to mess with the dispatcher, what I am proposing is a way to let users do that without breaking the design goals of Pulsar. That will avoid impacting the core data flow path, for large system/ hosted service/multi-tenant use cases. So my vote is not to allow this (and any other server side logic implementations) into the base dispatcher, but permit these kinds of changes as configurable dispatchers. I hope I have explained the reasons for that vote clearly. Joe On Mon, Nov 16, 2020 at 10:03 AM Sijie Guo <guosi...@gmail.com> wrote: > Andre, > > I left a comment on the pull request. But I will just copy them here as > well. > > I have a couple of comments and one suggestion. > > 1. What is the performance & GC implication with this change? I think most > of the questions on this pull request is about the performance & GC > implication. It would be good to show your benchmarking/testing methodology > and the benchmark results to the community. > > 2. How are you going to handle topics with end-to-end encryption enabled? > > 3. How do you handle acknowledgment for the messages that have been > filtered out and never sent to the consumers? I don't see it is discussed > in the PIP. Especially, how is it related to different subscription types? > > One suggestion - If this PIP is approved, my recommendation is to use the > NAR classloader to load the class. You can check how Pulsar uses NAR > classloader for other interfaces. > > Thanks, > Sijie > > On Mon, Nov 16, 2020 at 2:53 AM Kramer, Andre <andre.kra...@softwareag.com > > > wrote: > > > Sure, please feel free to copy the doc to wiki pages. It's mainly text so > > can be converted easily. > > > > Cheers, > > Andre > > > > -----Original Message----- > > From: Sijie Guo <guosi...@gmail.com> > > Sent: 13 November 2020 19:08 > > To: Dev <dev@pulsar.apache.org> > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers > > > > Andre, > > > > Is it possible to put it in a Google Doc (or similar collaboration tool) > > that allows other people to make comments? Also, it would be easier for > the > > committers to copy the PIP to Pulsar wiki pages. > > > > Thanks, > > Sijie > > > > On Fri, Nov 13, 2020 at 2:44 AM Kramer, Andre < > andre.kra...@softwareag.com > > > > > wrote: > > > > > Hi Sijie, > > > > > > I had added a PIP style document to the pull request: > > > https://github.com/andrekramer1/pulsar/blob/consumer-filter2-7-0/PIP-X > > > X%20-%20Consumer-filtering.pdf Hopefully that could be used to start > > > the discussion? > > > > > > Regards, > > > Andre > > > > > > -----Original Message----- > > > From: Sijie Guo <guosi...@gmail.com> > > > Sent: 12 November 2020 18:32 > > > To: Dev <dev@pulsar.apache.org> > > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers > > > > > > Hi Andre, > > > > > > I didn't see the attached writeup. Can you write a PIP for this > feature? > > > Given it is a big feature, it would be good to discuss it through a > PIP. > > > > > > - Sijie > > > > > > On Thu, Nov 12, 2020 at 6:17 AM Kramer, Andre > > > <andre.kra...@softwareag.com > > > > > > > wrote: > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > We at Software AG have prototyped adding filtering on Consumer > > > > subscriptions in the Pulsar broker and are submitting our changes > > > > for consideration under Apache 2.0 license. Please see pull request > > > > [Consumer Filtering #8544 > > > > https://github.com/apache/pulsar/pull/8544] > > > > and attached write up. Comments welcome! > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Andre > > > > > > > > > > > > > > > > andre.kra...@softwareag.com > > > > This communication contains information which is confidential and > > > > may also be privileged. It is for the exclusive use of the intended > > > > recipient(s). If you are not the intended recipient(s), please note > > > > that any distribution, copying, or use of this communication or the > > > > information in it, is strictly prohibited. If you have received this > > > > communication in error please notify us by e-mail and then delete > > > > the > > > e-mail and any copies of it. > > > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > > > *http://www.softwareag.com/uk > > > > * <http://www.softwareag.com/uk> > > > > > > > This communication contains information which is confidential and may > > > also be privileged. It is for the exclusive use of the intended > > > recipient(s). If you are not the intended recipient(s), please note > > > that any distribution, copying, or use of this communication or the > > > information in it, is strictly prohibited. If you have received this > > > communication in error please notify us by e-mail and then delete the > > e-mail and any copies of it. > > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > > http://www.softwareag.com/uk > > > > > This communication contains information which is confidential and may > also > > be privileged. It is for the exclusive use of the intended recipient(s). > If > > you are not the intended recipient(s), please note that any distribution, > > copying, or use of this communication or the information in it, is > strictly > > prohibited. If you have received this communication in error please > notify > > us by e-mail and then delete the e-mail and any copies of it. > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > http://www.softwareag.com/uk > > >