Hi all, I'm wanting to run a continuous stream processing job using Beam on a Flink runner within Kubernetes. I've been following this tutorial here (https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb) but I'm not sure what the author is referring to when he talks about the "flink master container". I don't understand how I am supposed to submit my Python code into the cluster, when that code is defined within a container image itself.
The Kubernetes Flink cluster architecture looks like this: * single JobManager, exposes the Flink web UI via a Service and Ingress * multiple Task Managers, each running 2 containers: * Flink task manager * Beam worker pool, which exposes port 50000 The Python code in the example tutorial has Beam configuration which looks like this: options = PipelineOptions([ "--runner=FlinkRunner", "--flink_version=1.10", "--flink_master=localhost:8081", "--environment_type=EXTERNAL", "--environment_config=localhost:50000" ]) It's clear that when you run this locally as per the tutorial, it speaks to the Beam worker pool to launch the application. However, if I have a Docker image containing my application code and I want to start this application within Kubernetes, where do I deploy this image in my Kubernetes cluster? Is it as a container within each Task Manager pod (and therefore using localhost:50000 to communicate to Beam)? Or do I create a single pod containing my application code and point that pod at port 50000 of my Task Managers - if so, is the fact that I have multiple Task Managers a problem? Any pointers to documentation or examples would be really helpful. Many thanks, John