Hi all,

I'm wanting to run a continuous stream processing job using Beam on a Flink 
runner within Kubernetes. I've been following this tutorial here 
(https://python.plainenglish.io/apache-beam-flink-cluster-kubernetes-python-a1965f37b7cb)
 but I'm not sure what the author is referring to when he talks about the 
"flink master container". I don't understand how I am supposed to submit my 
Python code into the cluster, when that code is defined within a container 
image itself.

The Kubernetes Flink cluster architecture looks like this:


  *   single JobManager, exposes the Flink web UI via a Service and Ingress

  *   multiple Task Managers, each running 2 containers:
     *   Flink task manager
     *   Beam worker pool, which exposes port 50000

The Python code in the example tutorial has Beam configuration which looks like 
this:

    options = PipelineOptions([
        "--runner=FlinkRunner",
        "--flink_version=1.10",
        "--flink_master=localhost:8081",
        "--environment_type=EXTERNAL",
        "--environment_config=localhost:50000"
    ])


It's clear that when you run this locally as per the tutorial, it speaks to the 
Beam worker pool to launch the application.  However, if I have a Docker image 
containing my application code and I want to start this application within 
Kubernetes, where do I deploy this image in my Kubernetes cluster? Is it as a 
container within each Task Manager pod (and therefore using localhost:50000 to 
communicate to Beam)? Or do I create a single pod containing my application 
code and point that pod at port 50000 of my Task Managers - if so, is the fact 
that I have multiple Task Managers a problem?

Any pointers to documentation or examples would be really helpful.

Many thanks,

John

Reply via email to