Joe - Very comprehensive writeup! > So my vote is not to allow this (and any other server side logic implementations) into the base dispatcher, but permit these kinds of changes as configurable dispatchers. I hope I have explained the reasons for that vote clearly.
+1. We can do it in a way like what we did for protocol handlers. Thanks, Sijie On Mon, Nov 16, 2020 at 10:40 AM Joe F <j...@apache.org> wrote: > We have had discussions in the community list on server side logic > previously. I would like to keep the specific proposal in this PIP aside, > and address what this PIP is implicitly changing in core Pulsar design. I > want to have an explicit discussion on that topic: what is the path for > server-side business logic in Pulsar? > > Pulsar has been designed to do a few things very well. It is designed to > be run as a hosted service, meaning it can be scaled horizontally by adding > storage or compute hardware, as traffic or tenants on the service grows. It > is optimized for data streaming at throughput and scale, and does > multi-tenancy extremely well. Part of that design is that there is no > business logic that is in the data flow path. Since business logic lives > outside of the core data flow path in Pulsar, the core is optimized for > data flow. Do plain byte movement - no ser/de, no byte copy, no > computations - and do it extremely well. Other systems, like Kafka and > Kinesis have taken the same approach; no to server side business logic. > > This particular PIP may be expensive on the server, or not. The next PIP > could be, and there is no rationale to stop adding any kind of business > logic into the broker, once this concept is allowed. > > Selective consumers are an anti-pattern for data flow systems. There are > systems out there that support implementation of business logic in the data > flow path, and they don't scale. Take the example of AMQ. AMQ allows > JMS/SQL-92 expressions server side. Once the door to this anti-pattern is > opened, there is no rhyme or reason to deny anything, upto including a > full-blown SQL query evaluation in the dispatch path. > > So why not allow that? Why not allow a full blown expression evaluation in > the data flow path? > > Unfortunately there is no way to answer this without bringing up the > conflict of interest between small users vs. large scale users running > multi-tenant hosted Pulsar, at huge traffic volumes. > > For low scale, single (or few) tenant installations, efficiency of flow, > latency and throughput are not the driving concern. In a small cluster, > the implications of cost and scale, is minimal in absolute terms, when > server side business logic is executed. > > For large scale users (like me) this is a no go. There are many problems > with this, that makes it very difficult to run a hosted platform with > predictable SLAs, once users can introduce business logic into the broker. > These are on top of the performance and cost implications > > First, broker throughput and performance becomes unpredictable. The > current Pulsar load model (and it is used in the load manager for load > balancing) becomes unusable. Not only that, there will be no pre-computed > model that can be used in the load manager. Since the producer and > consumer randomly decide on what is the business logic,and the computation > can change based on the data, the model itself becomes dynamic and the > load manager has to rebuild the model anytime an user updates the business > logic. That is a tall order, worth years of work to implement. > > Second, this introduces the noisy neighbor issue. Two tenants will happily > run on the same broker, till one of them decides to change the logic on the > subscription, and suddenly the quality for the other tenant is degraded > because the broker is impacted. The system operator of the cluster has now > to get involved out of the blue, because one tenant did a change. > Basically any tenant can disrupt the system by triggering additional > business logic in the server, or by specific data patterns that can make > the business logic expensive on the server > > Third, this makes provisioning capacity impossible. Today Pulsar users can > be provisioned on flow - bw in/out. Msgs in/out. With server side business > logic, there is some random overhead that needs to be accounted in the > capacity calculation. > > We, who run Pulsar as a hosted service, do not want any of our tenants to > introduce server side logic into the service. Because, to do it well > requires a load balancer that can continuously and dynamically adjust its > load model and capacity model (based on ML on the traffic maybe). The > scope of building such a system will convert Pulsar from a data streaming > project to a load balancer/resource manager project. The only viable > solution will be to give each tenant their own dedicated servers - at which > point all claims to multi-tenancy in Pulsar should be dropped. > > > So large multi-tenant clusters will have big problems with the addition of > business logic into the broker. > > But this problem - Pulsar users attempting to add server side logic into > Pulsar - is not going to go away. There will always be yet another new user > who will ask for adding ‘one more simple implementation' of server side > business logic into the broker. > > My suggestion here is simple. Make the dispatcher a configurable module. > Let users who want to do server side logic configure their own > computational logic in custom dispatchers and use it to their needs. > Allow users to implement custom dispatchers as a loadable module. Users > can then implement whatever logic they need to, without depending on > Pulsar, and the code and module will remain in user-land rather than Pulsar > land. No one will be required to contribute their dispatchers to Pulsar, > but if there are specific dispatchers which can have widespread use, they > can contribute it back into Pulsar (like connectors) > > If this seems suspiciously similar to functions, then yes, it is. Functions > were meant to fulfill this need, but without messing with the dispatcher. > Functions were meant to do business logic outside the hosted service, so > that the service itself is not impacted by random users injecting business > logic into the platform. > > But if functions are not acceptable, and users still want to mess with the > dispatcher, what I am proposing is a way to let users do that without > breaking the design goals of Pulsar. That will avoid impacting the core > data flow path, for large system/ hosted service/multi-tenant use cases. > > So my vote is not to allow this (and any other server side logic > implementations) into the base dispatcher, but permit these kinds of > changes as configurable dispatchers. I hope I have explained the reasons > for that vote clearly. > > > Joe > > > On Mon, Nov 16, 2020 at 10:03 AM Sijie Guo <guosi...@gmail.com> wrote: > > > Andre, > > > > I left a comment on the pull request. But I will just copy them here as > > well. > > > > I have a couple of comments and one suggestion. > > > > 1. What is the performance & GC implication with this change? I think > most > > of the questions on this pull request is about the performance & GC > > implication. It would be good to show your benchmarking/testing > methodology > > and the benchmark results to the community. > > > > 2. How are you going to handle topics with end-to-end encryption enabled? > > > > 3. How do you handle acknowledgment for the messages that have been > > filtered out and never sent to the consumers? I don't see it is discussed > > in the PIP. Especially, how is it related to different subscription > types? > > > > One suggestion - If this PIP is approved, my recommendation is to use the > > NAR classloader to load the class. You can check how Pulsar uses NAR > > classloader for other interfaces. > > > > Thanks, > > Sijie > > > > On Mon, Nov 16, 2020 at 2:53 AM Kramer, Andre < > andre.kra...@softwareag.com > > > > > wrote: > > > > > Sure, please feel free to copy the doc to wiki pages. It's mainly text > so > > > can be converted easily. > > > > > > Cheers, > > > Andre > > > > > > -----Original Message----- > > > From: Sijie Guo <guosi...@gmail.com> > > > Sent: 13 November 2020 19:08 > > > To: Dev <dev@pulsar.apache.org> > > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers > > > > > > Andre, > > > > > > Is it possible to put it in a Google Doc (or similar collaboration > tool) > > > that allows other people to make comments? Also, it would be easier for > > the > > > committers to copy the PIP to Pulsar wiki pages. > > > > > > Thanks, > > > Sijie > > > > > > On Fri, Nov 13, 2020 at 2:44 AM Kramer, Andre < > > andre.kra...@softwareag.com > > > > > > > wrote: > > > > > > > Hi Sijie, > > > > > > > > I had added a PIP style document to the pull request: > > > > > https://github.com/andrekramer1/pulsar/blob/consumer-filter2-7-0/PIP-X > > > > X%20-%20Consumer-filtering.pdf Hopefully that could be used to start > > > > the discussion? > > > > > > > > Regards, > > > > Andre > > > > > > > > -----Original Message----- > > > > From: Sijie Guo <guosi...@gmail.com> > > > > Sent: 12 November 2020 18:32 > > > > To: Dev <dev@pulsar.apache.org> > > > > Subject: Re: Proposal for Consumer Filtering in Pulsar brokers > > > > > > > > Hi Andre, > > > > > > > > I didn't see the attached writeup. Can you write a PIP for this > > feature? > > > > Given it is a big feature, it would be good to discuss it through a > > PIP. > > > > > > > > - Sijie > > > > > > > > On Thu, Nov 12, 2020 at 6:17 AM Kramer, Andre > > > > <andre.kra...@softwareag.com > > > > > > > > > wrote: > > > > > > > > > Hello everyone, > > > > > > > > > > > > > > > > > > > > We at Software AG have prototyped adding filtering on Consumer > > > > > subscriptions in the Pulsar broker and are submitting our changes > > > > > for consideration under Apache 2.0 license. Please see pull request > > > > > [Consumer Filtering #8544 > > > > > https://github.com/apache/pulsar/pull/8544] > > > > > and attached write up. Comments welcome! > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Andre > > > > > > > > > > > > > > > > > > > > andre.kra...@softwareag.com > > > > > This communication contains information which is confidential and > > > > > may also be privileged. It is for the exclusive use of the intended > > > > > recipient(s). If you are not the intended recipient(s), please note > > > > > that any distribution, copying, or use of this communication or the > > > > > information in it, is strictly prohibited. If you have received > this > > > > > communication in error please notify us by e-mail and then delete > > > > > the > > > > e-mail and any copies of it. > > > > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > > > > *http://www.softwareag.com/uk > > > > > * <http://www.softwareag.com/uk> > > > > > > > > > This communication contains information which is confidential and may > > > > also be privileged. It is for the exclusive use of the intended > > > > recipient(s). If you are not the intended recipient(s), please note > > > > that any distribution, copying, or use of this communication or the > > > > information in it, is strictly prohibited. If you have received this > > > > communication in error please notify us by e-mail and then delete the > > > e-mail and any copies of it. > > > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > > > http://www.softwareag.com/uk > > > > > > > This communication contains information which is confidential and may > > also > > > be privileged. It is for the exclusive use of the intended > recipient(s). > > If > > > you are not the intended recipient(s), please note that any > distribution, > > > copying, or use of this communication or the information in it, is > > strictly > > > prohibited. If you have received this communication in error please > > notify > > > us by e-mail and then delete the e-mail and any copies of it. > > > Software AG (UK) Limited Registered in England & Wales 1310740 - > > > http://www.softwareag.com/uk > > > > > >