Hi Ryan.

Thanks for your inputs. I think it's better to load user python
dependencies dynamically rather than use different images because image is
not flexible, because using image is hard to test:
* we need to build an image and push the image to docker hub for testing...
* it takes a lot of time to build images...

Best,
Shengkai


Ryan van Huuksloot <ryan.vanhuuksl...@shopify.com.invalid> 于2024年12月5日周四
12:46写道:

> Hi Shengkai,
>
> re: (1)
> That is how we currently handle image management.
>
> re: (2)
> The current proposed use case is that MLEs provide different PyFlink jobs
> which can have different dependencies/version requirements and these
> packages can be quite large (GBs).
> In the Java world, you'd provide a different uber jar with the dependencies
> and that should work. In Python, as far as I know, you can't provide the
> same bundled dependencies.
> This means that we need to preload the image with all of the dependencies
> but those dependencies would be static based on the pre-defined image. And
> different workloads on this session cluster may require different
> dependencies / versions.
>
> Maybe it is simpler to provide a way to dynamically provide dependencies in
> Python - similar to Java?
>
> (I haven't use the jar submission in Java)
>
> Thanks,
> Ryan van Huuksloot
> Sr. Production Engineer | Streaming Platform
> [image: Shopify]
> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
> On Tue, Dec 3, 2024 at 9:11 PM Shengkai Fang <fskm...@gmail.com> wrote:
>
> > Hi Ryan.
> >
> > Thanks for your input. I am not a k8s expert, but I know that Flink k8s
> > deployments supports to get Flink TaskManager with specified pod
> > template[1], which supports to specify image. @Junrui may provide more
> > detailed information about this topic.
> >
> > If different taskmanager has different workload, it means the slot in the
> > different taskamanger has different profiles. Otherwise, scheduler
> doesn't
> > know the difference among different slots and may choose the wrong slot
> to
> > run the task. I am just curious what's the difference between the ETL job
> > and ML job.
> >
> > Best,
> > Shengkai
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template
> >
> > Ryan van Huuksloot <ryan.vanhuuksl...@shopify.com.invalid> 于2024年12月3日周二
> > 22:11写道:
> >
> > > Hi Shengkai,
> > >
> > > Today we currently use application mode. It is an option and may be the
> > > recommendation.
> > >
> > > Specifically for Batch jobs, we have Machine Learning pipelines that
> are
> > > ephemeral however they contain very different dependencies depending on
> > the
> > > workload.
> > > From my perspective, Batch jobs work well on Session Clusters. However,
> > due
> > > to the differing images you cannot run different workloads on the same
> > > session cluster. Making the session cluster essentially useless.
> > >
> > > Ryan van Huuksloot
> > > Sr. Production Engineer | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Tue, Dec 3, 2024 at 1:20 AM Shengkai Fang <fskm...@gmail.com>
> wrote:
> > >
> > > > Hi.
> > > >
> > > > Why needs different image for taskmanager? Do you mean different
> > > operators
> > > > require different resources?
> > > >
> > > > As far as I know, JM supports to manage taskmanager with different
> > > > profiles. For example, a cluster may consists of two taskmanagers
> with
> > > > following profiles:
> > > > * TM1 contains 4 slots, every slot has 2 core, 4GB Memory
> > > > * TM2 contains 4 slots, every slot have 1core, 2GB Memory
> > > >
> > > > > the scheduler would need some level of job isolation
> > > >
> > > > You can use application mode to run the job. In application mode, the
> > > > cluster is dedicated for the job.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > >
> >
>

Reply via email to