To the best of my knowledge, for Flink deployment on Kubernetes we have two options as of now : (1) active K8S integration with separate job manager per job and (2) reactive container mode with auto rescale based on some metrics: Could you please give me on the hint on the below:
A - Are the two integrations already integrated to Flink recent releases? Any documentation on that? B - In all cases it is necessary to kill and restart the job which is a concern for some critical use cases? Can a rolling upgrade be used to have a zero down time while recalling/upgrading? C- In such recasle mechanism, does Kubernetes/Flink identify which stream operator is the source of load/utilization and rescale it individually, or the rescaling is done at the granularity of whole job. D- for stateful operators/jobs, how the state repartitioning and assignment to new instances is performed? Does this repartitioning/reassignment is time consuming especially for large states? Thank you. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/