Thanks for the FLIP Yangze. GPU resource management support is a must-have for machine learning use cases. Actually it is one of the mostly asked question from the users who are interested in using Flink for ML.
Some quick comments / questions to the wiki. 1. The WebUI / REST API should probably also be mentioned in the public interface section. 2. Is the data structure that holds GPU info also a public API? Thanks, Jiangjie (Becket) Qin On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <tonysong...@gmail.com> wrote: > Thanks for drafting the FLIP and kicking off the discussion, Yangze. > > Big +1 for this feature. Supporting using of GPU in Flink is significant, > especially for the ML scenarios. > I've reviewed the FLIP wiki doc and it looks good to me. I think it's a > very good first step for Flink's GPU supports. > > Thank you~ > > Xintong Song > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <karma...@gmail.com> wrote: > > > Hi everyone, > > > > We would like to start a discussion thread on "FLIP-108: Add GPU > > support in Flink"[1]. > > > > This FLIP mainly discusses the following issues: > > > > - Enable user to configure how many GPUs in a task executor and > > forward such requirements to the external resource managers (for > > Kubernetes/Yarn/Mesos setups). > > - Provide information of available GPU resources to operators. > > > > Key changes proposed in the FLIP are as follows: > > > > - Forward GPU resource requirements to Yarn/Kubernetes. > > - Introduce GPUManager as one of the task manager services to discover > > and expose GPU resource information to the context of functions. > > - Introduce the default script for GPU discovery, in which we provide > > the privilege mode to help user to achieve worker-level isolation in > > standalone mode. > > > > Please find more details in the FLIP wiki document [1]. Looking forward > to > > your feedbacks. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > Best, > > Yangze Guo > > >