Hi Edward and Eron,
you're right that there is currently no
JobClusterEntrypoint implementation for Kubernetes. How this
entrypoint looks like mostly depends on how the job is stored
and retrieved. There are multiple ways conceivable:
- The entrypoint connects to an external system from which
it fetches the JobGraph
- The entrypoint contains the serialized JobGraph similar
to how the YarnJobClusterEntrypoint works, but this would mean
that you have a separate image per job
- The entrypoint actually executes a user jar which
generates the JobGraph similar to what happens on the client
when you submit a job
I'm not a Kubernetes expert and therefore I don't know
what's the most idiomatic approach to it. But once we have
figured this out, it should not be too difficult to write the
Kubernetes JobClusterEntrypoint.
If we say that Kubernetes is responsible for assigning new
resources, then we need a special KubernetesResourceManager
which automatically assigns all registered slots to the single
JobMaster. This JobMaster would then accept all slots and
scale the job to how many slots it got offered. That way we
could easily let K8 control the resources.
If there is a way to communicate with K8 from within Flink,
then we could also implement a mode which is similar to
Flink's Yarn integration. The K8RM would then ask for new pods
to be started if the JM needs more slots.
The per-job mode on K8 won't unfortunately make it into
Flink 1.5. But I'm confident that the community will address
this issue with Flink 1.6.
Cheers,
Till