Hello!

I'm trying to understand the internal mechanism used by Flink Statefun to
dispatch functions to Flink cluster. In particular, I was trying to find a
good example demonstrating Statefun's "Logical Co-location, Physical
Separation" properties (as pointed out by [1]).

My understanding based on the doc was that there are three modes to deploy
statefun to Flink cluster ranging from remote functions to embedded
functions (decreasing storage locality and increasing efficiency).
Therefore, in the scenario of remote functions, functions are deployed with
its state and message independent from flink processes. And function should
be able to be executed in any Flink process as if it is a stateless
application. I have tried out a couple of examples from the statefun but
judging by allocation result the subtask of the job seems to bind
statically with each task slot in the Flink cluster (I'm assuming the
example such as DataStream uses embedded function instead?).

I also came across this tutorial [2] demonstrating the usage of remote
function. The README indicates [3] that "Since the functions are stateless,
and running in their own container, we can redeploy and rescale it
independently of the rest of the infrastructure." which seems to indicate
that the function performs scaling manually by the user that could occupy
arbitrary resources (e.g., task slots) from the Flink cluster on demand.
But I wasn't sure how to explicitly specify the amount of parallelism for
each function dynamically.
Is there a good example to visualize statefun "physical separation"
behavior by forcing the same function to be invoked at different task slots
/ machines (either on-demand or automatically)?

Any help will be appreciated!

Thanks!

Le


[1]
https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/concepts/distributed_architecture.html#remote-functions
[2] https://github.com/ververica/flink-statefun-workshop
[3]
https://github.com/ververica/flink-statefun-workshop#restart-the-functions

Reply via email to