Re: [DISCUSS] Pluggable Pulsar Functions runtime to support new runtimes

Lari Hotari Wed, 21 Jun 2023 05:33:51 -0700

On 2023/06/21 07:21:31 Asaf Mesika wrote:
> Lari, would it be possible to explain in more detail the paint points
> you're describing?

Well the point of the pluggable Function runtime types is to support other 
technologies. Let's forget the reactive messaging solution for a moment.
With a pluggable solution, I mean having a solution in place where you could 
possibly add .nar files to some directory and add support for new runtime types 
by implementing some plugin specification. The current solution doesn't contain 
this property.
A pluggable solution would make it easier for contributing new runtime types.
Let's say if we would want to add support for these technologies:
* functions written in Node.js / JavaScript
* functions using WebAssembly (WASM), for example implemented in Rust that also 
compiles to WASM.

> You say processing messages individually is slow; hence, processing them in
> batches is better. I guess it's especially useful if you need to group a
> batch based on a key. What I don't understand is how the framework today
> limits you from using something like a reactive client which does the
> batching inside.

I didn't say anything about batches. It's about pipelining. That means that you
have multiple messages "in flight". That is different than batching. The most
well known example of pipelining is HTTP pipelining [1].
Pulsar Functions already supports async functions which are functions that have
a method that returns a CompetableFuture type. To limit the amount of messages
"in flight", the worker config includes a setting "maxPendingAsyncRequests" [2]
which defaults to 1000. It is odd that the setting is at worker config level
and not at the function level.
Reactive Streams is not about batching. One of the clear benefits over plain
async programming is that there's a well defined way for handling backpressure.
For any high scale system handling backpressure (== flow control) is one of the
core concerns.

In this case, if there was a pluggable Pulsar Functions runtime, it would be
possible to add a runtime type optimized for Reactive Pulsar. That could also
enable using Spring Pulsar in Reactive mode with the rest of Reactive Spring.

The current .nar plugin packaging is a mess. If you take a look of what goes
inside a .nar file, it is a mess. There are classes that shouldn't be there.
The .nar plugin creation is a very slow and inefficient. I can provide details
if you are interested to know.

With pluggable Pulsar Functions runtime, it would also be possible to create a
cleaner packaging for JVM functions. Packaging for different ecosystems like
Quarkus and Spring Boot could be optimized for those ecosystems and not the
other way around where Pulsar's outdated .nar packaging is dictating the
options.

In addition, the Pulsar Functions have a missing piece in how functions are
mapped to instances. It's not very efficient to even run each and every
function as a separate deployable entity. The cost of each independent JVMs is
high. It would be also better to have a model where where could be a group of
functions that are provided by one instance and always run together. Having
this option could bring down the cost and also improve the developer
experience. The framework shouldn't require the developer that each individual
function is deployed in a separate .jar file which gets run in a separate JVM.

So you asked if there is pain with Pulsar Functions. There definitely is.
Instead of causing more fragmentation in the ecosystem with multiple pluggable
infrastructure layers, we should make the core upstream offering better.

I'd also like to see a deployment option for Pulsar Functions where you could
choose to not deploy Pulsar Functions with pulsar-admin and instead package the
functions in an application that you deploy in Kubernetes with helm or whatever
way you choose to do that.
This could also be taken into account when designing the pluggable Pulsar
Functions runtime.

StreamNative's Function Mesh [3] takes a different approach to Pulsar Function
life cycle management. That might be a good fit in many cases.
However, we should have a way where Pulsar Functions could be deployed without
any central management solution, as ordinary applications.

Perhaps everyone is happy with the current way Pulsar Functions are. If
everyone is already satisfied, things won't improve. Do we want to make Pulsar
more popular and easier for our users? Do we care about supporting node.js /
Javascript / Typescript or new languages like Rust? If we do, we better start
thinking of adding that support. I would like to propose that we make adding
new runtime types easy by making it "pluggable". That could mean multiple
things and that's why we are having this discussion. I hope others could also
chime in.

-Lari

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2]
https://github.com/apache/pulsar/blob/f7c0b3c49c9ad8c28d0b00aa30d727850eb8bc04/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/worker/WorkerConfig.java#L725-L729
[3] https://github.com/streamnative/function-mesh

Re: [DISCUSS] Pluggable Pulsar Functions runtime to support new runtimes

Reply via email to