Hi Ryan. Thanks for your inputs. I think it's better to load user python dependencies dynamically rather than use different images because image is not flexible, because using image is hard to test: * we need to build an image and push the image to docker hub for testing... * it takes a lot of time to build images...
Best, Shengkai Ryan van Huuksloot <ryan.vanhuuksl...@shopify.com.invalid> 于2024年12月5日周四 12:46写道: > Hi Shengkai, > > re: (1) > That is how we currently handle image management. > > re: (2) > The current proposed use case is that MLEs provide different PyFlink jobs > which can have different dependencies/version requirements and these > packages can be quite large (GBs). > In the Java world, you'd provide a different uber jar with the dependencies > and that should work. In Python, as far as I know, you can't provide the > same bundled dependencies. > This means that we need to preload the image with all of the dependencies > but those dependencies would be static based on the pre-defined image. And > different workloads on this session cluster may require different > dependencies / versions. > > Maybe it is simpler to provide a way to dynamically provide dependencies in > Python - similar to Java? > > (I haven't use the jar submission in Java) > > Thanks, > Ryan van Huuksloot > Sr. Production Engineer | Streaming Platform > [image: Shopify] > <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email> > > > On Tue, Dec 3, 2024 at 9:11 PM Shengkai Fang <fskm...@gmail.com> wrote: > > > Hi Ryan. > > > > Thanks for your input. I am not a k8s expert, but I know that Flink k8s > > deployments supports to get Flink TaskManager with specified pod > > template[1], which supports to specify image. @Junrui may provide more > > detailed information about this topic. > > > > If different taskmanager has different workload, it means the slot in the > > different taskamanger has different profiles. Otherwise, scheduler > doesn't > > know the difference among different slots and may choose the wrong slot > to > > run the task. I am just curious what's the difference between the ETL job > > and ML job. > > > > Best, > > Shengkai > > > > [1] > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#pod-template > > > > Ryan van Huuksloot <ryan.vanhuuksl...@shopify.com.invalid> 于2024年12月3日周二 > > 22:11写道: > > > > > Hi Shengkai, > > > > > > Today we currently use application mode. It is an option and may be the > > > recommendation. > > > > > > Specifically for Batch jobs, we have Machine Learning pipelines that > are > > > ephemeral however they contain very different dependencies depending on > > the > > > workload. > > > From my perspective, Batch jobs work well on Session Clusters. However, > > due > > > to the differing images you cannot run different workloads on the same > > > session cluster. Making the session cluster essentially useless. > > > > > > Ryan van Huuksloot > > > Sr. Production Engineer | Streaming Platform > > > [image: Shopify] > > > < > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email > > > > > > > > > > > > On Tue, Dec 3, 2024 at 1:20 AM Shengkai Fang <fskm...@gmail.com> > wrote: > > > > > > > Hi. > > > > > > > > Why needs different image for taskmanager? Do you mean different > > > operators > > > > require different resources? > > > > > > > > As far as I know, JM supports to manage taskmanager with different > > > > profiles. For example, a cluster may consists of two taskmanagers > with > > > > following profiles: > > > > * TM1 contains 4 slots, every slot has 2 core, 4GB Memory > > > > * TM2 contains 4 slots, every slot have 1core, 2GB Memory > > > > > > > > > the scheduler would need some level of job isolation > > > > > > > > You can use application mode to run the job. In application mode, the > > > > cluster is dedicated for the job. > > > > > > > > Best, > > > > Shengkai > > > > > > > > > >