Hello! I'm trying to understand the internal mechanism used by Flink Statefun to dispatch functions to Flink cluster. In particular, I was trying to find a good example demonstrating Statefun's "Logical Co-location, Physical Separation" properties (as pointed out by [1]).
My understanding based on the doc was that there are three modes to deploy statefun to Flink cluster ranging from remote functions to embedded functions (decreasing storage locality and increasing efficiency). Therefore, in the scenario of remote functions, functions are deployed with its state and message independent from flink processes. And function should be able to be executed in any Flink process as if it is a stateless application. I have tried out a couple of examples from the statefun but judging by allocation result the subtask of the job seems to bind statically with each task slot in the Flink cluster (I'm assuming the example such as DataStream uses embedded function instead?). I also came across this tutorial [2] demonstrating the usage of remote function. The README indicates [3] that "Since the functions are stateless, and running in their own container, we can redeploy and rescale it independently of the rest of the infrastructure." which seems to indicate that the function performs scaling manually by the user that could occupy arbitrary resources (e.g., task slots) from the Flink cluster on demand. But I wasn't sure how to explicitly specify the amount of parallelism for each function dynamically. Is there a good example to visualize statefun "physical separation" behavior by forcing the same function to be invoked at different task slots / machines (either on-demand or automatically)? Any help will be appreciated! Thanks! Le [1] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/concepts/distributed_architecture.html#remote-functions [2] https://github.com/ververica/flink-statefun-workshop [3] https://github.com/ververica/flink-statefun-workshop#restart-the-functions