On 2023/06/21 07:21:31 Asaf Mesika wrote:
> Lari, would it be possible to explain in more detail the paint points
> you're describing?
 
Well the point of the pluggable Function runtime types is to support other 
technologies. Let's forget the reactive messaging solution for a moment.
With a pluggable solution, I mean having a solution in place where you could 
possibly add .nar files to some directory and add support for new runtime types 
by implementing some plugin specification. The current solution doesn't contain 
this property.
A pluggable solution would make it easier for contributing new runtime types.
Let's say if we would want to add support for these technologies:
* functions written in Node.js / JavaScript
* functions using WebAssembly (WASM), for example implemented in Rust that also 
compiles to WASM.

> You say processing messages individually is slow; hence, processing them in
> batches is better. I guess it's especially useful if you need to group a
> batch based on a key. What I don't understand is how the framework today
> limits you from using something like a reactive client which does the
> batching inside.

I didn't say anything about batches. It's about pipelining. That means that you 
have multiple messages "in flight". That is different than batching. The most 
well known example of pipelining is HTTP pipelining [1].
Pulsar Functions already supports async functions which are functions that have 
a method that returns a CompetableFuture type. To limit the amount of messages 
"in flight", the worker config includes a setting "maxPendingAsyncRequests" [2] 
which defaults to 1000. It is odd that the setting is at worker config level 
and not at the function level.
Reactive Streams is not about batching. One of the clear benefits over plain 
async programming is that there's a well defined way for handling backpressure. 
For any high scale system handling backpressure (== flow control) is one of the 
core concerns.

In this case, if there was a pluggable Pulsar Functions runtime, it would be 
possible to add a runtime type optimized for Reactive Pulsar. That could also 
enable using Spring Pulsar in Reactive mode with the rest of Reactive Spring. 

The current .nar plugin packaging is a mess. If you take a look of what goes 
inside a .nar file, it is a mess. There are classes that shouldn't be there. 
The .nar plugin creation is a very slow and inefficient. I can provide details 
if you are interested to know. 

With pluggable Pulsar Functions runtime, it would also be possible to create a 
cleaner packaging for JVM functions. Packaging for different ecosystems like 
Quarkus and Spring Boot could be optimized for those ecosystems and not the 
other way around where Pulsar's outdated .nar packaging is dictating the 
options.

In addition, the Pulsar Functions have a missing piece in how functions are 
mapped to instances. It's not very efficient to even run each and every 
function as a separate deployable entity. The cost of each independent JVMs is 
high. It would be also better to have a model where where could be a group of 
functions that are provided by one instance and always run together. Having 
this option could bring down the cost and also improve the developer 
experience. The framework shouldn't require the developer that each individual 
function is deployed in a separate .jar file which gets run in a separate JVM. 

So you asked if there is pain with Pulsar Functions. There definitely is. 
Instead of causing more fragmentation in the ecosystem with multiple pluggable 
infrastructure layers, we should make the core upstream offering better. 

I'd also like to see a deployment option for Pulsar Functions where you could 
choose to not deploy Pulsar Functions with pulsar-admin and instead package the 
functions in an application that you deploy in Kubernetes with helm or whatever 
way you choose to do that.
This could also be taken into account when designing the pluggable Pulsar 
Functions runtime. 

StreamNative's Function Mesh [3] takes a different approach to Pulsar Function 
life cycle management. That might be a good fit in many cases. 
However, we should have a way where Pulsar Functions could be deployed without 
any central management solution, as ordinary applications. 

Perhaps everyone is happy with the current way Pulsar Functions are. If 
everyone is already satisfied, things won't improve. Do we want to make Pulsar 
more popular and easier for our users? Do we care about supporting node.js / 
Javascript / Typescript or new languages like Rust? If we do, we better start 
thinking of adding that support. I would like to propose that we make adding 
new runtime types easy by making it "pluggable". That could mean multiple 
things and that's why we are having this discussion. I hope others could also 
chime in.

-Lari

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] 
https://github.com/apache/pulsar/blob/f7c0b3c49c9ad8c28d0b00aa30d727850eb8bc04/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/worker/WorkerConfig.java#L725-L729
[3] https://github.com/streamnative/function-mesh

Reply via email to