Thanks for the great feedback. Just as Xintong said, fine grained resource management has not been > introduced to flink. And i think it is the elegant solution for > your scenario. Task managers with different resource specification will be > allocated and started by Yarn/k8s resource manager according to your > operator resource request. So your resource-intensive tasks will be > deployed to the task manager with more resources. >
Is this feature theoretical or is it on the Blink branch already? Is there a Jira ticket for this? >> If integrated with k8s natively, Flinkās resource manager will call > kubernetes API to allocate/release new pods and adjust the resource usage > on demand, and then support scale-up and scale down. > > BTW, FLINK-9953 does not contains the ability of auto scale. It just > allocates the resource according to the flink job and will not > automatically scale up/down based on the back logs or metrics. The auto > scale of flink cluster is a more general topic and should be > designed independently without resource management system(Yarn/k8s/mesos). > Ok, so if I'm understanding you correctly, there are two opportunities to adjust/scale resources: 1) at the time that the job is assigned to a task manager (currently supported by Yarn/Mesos. k8s case covered by FLINK-9953). 2) after a job has already been running on a task manager (not yet supported by any resource manager due to lack of support within Flink in general) And IIUC "fine grained resource management" is an improvement on top both of these, which allows Flink to use heterogenous task manager resources, each customized based on specifications of the operator. I'm still unclear what is meant by "It just allocates the resource according to the flink job". What information does the job provide that is currently taken into consideration when allocating resources? IIUC, the answer is in terms of *number* of slots, since each slot will have the same resource profile. I feel like the docs do not cover the Resource Manager story very well. It's not mentioned at all in the page on Distributed Runtime Environment [1] or Jobs and Scheduling [2]. thanks, chad [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html#task-slots-and-resources [2] https://ci.apache.org/projects/flink/flink-docs-release-1.8/internals/job_scheduling.html