Hello,
Sorry for the delay.
I agree I think that works for most workflows. The only caveat would be
CUDA based ML workflows. You can't bundle CUDA into a dependency bundle.
Overall, it works in application mode. It would just be awesome to use
Session clusters for Batch / ephemeral test streaming
Hi Ryan,
It supports configuring the Python dependencies at job wise in PyFlink
and so per my understanding, "dynamically provide dependencies in
Python" should already be supported. Besides, it also supports
specifying Python dependencies which are located in distributed file
systems. It would be
Hi Ryan.
Thanks for your inputs. I think it's better to load user python
dependencies dynamically rather than use different images because image is
not flexible, because using image is hard to test:
* we need to build an image and push the image to docker hub for testing...
* it takes a lot of tim
Hi Shengkai,
re: (1)
That is how we currently handle image management.
re: (2)
The current proposed use case is that MLEs provide different PyFlink jobs
which can have different dependencies/version requirements and these
packages can be quite large (GBs).
In the Java world, you'd provide a diffe
Hi Ryan.
Thanks for your input. I am not a k8s expert, but I know that Flink k8s
deployments supports to get Flink TaskManager with specified pod
template[1], which supports to specify image. @Junrui may provide more
detailed information about this topic.
If different taskmanager has different wo
Hi Shengkai,
Today we currently use application mode. It is an option and may be the
recommendation.
Specifically for Batch jobs, we have Machine Learning pipelines that are
ephemeral however they contain very different dependencies depending on the
workload.
>From my perspective, Batch jobs work
Hi.
Why needs different image for taskmanager? Do you mean different operators
require different resources?
As far as I know, JM supports to manage taskmanager with different
profiles. For example, a cluster may consists of two taskmanagers with
following profiles:
* TM1 contains 4 slots, every s
Hello,
We are looking into running batch jobs on Flink clusters. Intuitively,
Session Clusters seem like an excellent deployment mode.
However, the challenge is that batch jobs may have different image
requirements, especially for ML workloads.
Currently, task managers must be homogeneous, meani