Hi Shengkai,

re: (1)
That is how we currently handle image management.

re: (2)
The current proposed use case is that MLEs provide different PyFlink jobs
which can have different dependencies/version requirements and these
packages can be quite large (GBs).
In the Java world, you'd provide a different uber jar with the dependencies
and that should work. In Python, as far as I know, you can't provide the
same bundled dependencies.
This means that we need to preload the image with all of the dependencies
but those dependencies would be static based on the pre-defined image. And
different workloads on this session cluster may require different
dependencies / versions.

Maybe it is simpler to provide a way to dynamically provide dependencies in
Python - similar to Java?

(I haven't use the jar submission in Java)

Thanks,
Ryan van Huuksloot
Sr. Production Engineer | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Tue, Dec 3, 2024 at 9:11 PM Shengkai Fang <fskm...@gmail.com> wrote:

> Hi Ryan.
>
> Thanks for your input. I am not a k8s expert, but I know that Flink k8s
> deployments supports to get Flink TaskManager with specified pod
> template[1], which supports to specify image. @Junrui may provide more
> detailed information about this topic.
>
> If different taskmanager has different workload, it means the slot in the
> different taskamanger has different profiles. Otherwise, scheduler doesn't
> know the difference among different slots and may choose the wrong slot to
> run the task. I am just curious what's the difference between the ETL job
> and ML job.
>
> Best,
> Shengkai
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
>
> Ryan van Huuksloot <ryan.vanhuuksl...@shopify.com.invalid> 于2024年12月3日周二
> 22:11写道:
>
> > Hi Shengkai,
> >
> > Today we currently use application mode. It is an option and may be the
> > recommendation.
> >
> > Specifically for Batch jobs, we have Machine Learning pipelines that are
> > ephemeral however they contain very different dependencies depending on
> the
> > workload.
> > From my perspective, Batch jobs work well on Session Clusters. However,
> due
> > to the differing images you cannot run different workloads on the same
> > session cluster. Making the session cluster essentially useless.
> >
> > Ryan van Huuksloot
> > Sr. Production Engineer | Streaming Platform
> > [image: Shopify]
> > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> >
> >
> >
> > On Tue, Dec 3, 2024 at 1:20 AM Shengkai Fang <fskm...@gmail.com> wrote:
> >
> > > Hi.
> > >
> > > Why needs different image for taskmanager? Do you mean different
> > operators
> > > require different resources?
> > >
> > > As far as I know, JM supports to manage taskmanager with different
> > > profiles. For example, a cluster may consists of two taskmanagers with
> > > following profiles:
> > > * TM1 contains 4 slots, every slot has 2 core, 4GB Memory
> > > * TM2 contains 4 slots, every slot have 1core, 2GB Memory
> > >
> > > > the scheduler would need some level of job isolation
> > >
> > > You can use application mode to run the job. In application mode, the
> > > cluster is dedicated for the job.
> > >
> > > Best,
> > > Shengkai
> > >
> >
>

Reply via email to