Great points Sean. Here’s what I’d like to suggest to move forward. Split the SPIP.
If we want to propose upfront homogeneous allocation (aka spark.task.gpus), this should be one on its own and for instance, I really agree with Sean (like I did in the discuss thread) that we can’t simply non-goal Mesos. We have enough maintenance issue as it is. And IIRC there was a PR proposed for K8S that I’d like to see bring that discussion here as well. IMO upfront allocation is less useful. Specifically too expensive for large jobs. If we want per-stage resource request, this should a full SPIP with a lot more details to be hashed out. Our work with Horovod brings a few specific and critical requirements on how this should work with distributed DL and I would like to see those addressed. In any case I’d like to see more consensus before moving forward, until then I’m going to -1 this. ________________________________ From: Sean Owen <sro...@gmail.com> Sent: Sunday, March 3, 2019 8:15 AM To: Felix Cheung Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling I'm for this in general, at least a +0. I do think this has to have a story for what to do with the existing Mesos GPU support, which sounds entirely like the spark.task.gpus config here. Maybe it's just a synonym? that kind of thing. Requesting different types of GPUs might be a bridge too far, but, that's a P2 detail that can be hashed out later. (For example, if a v100 is available and k80 was requested, do you use it or fail? is the right level of resource control GPU RAM and cores?) The per-stage resource requirements sounds like the biggest change; you can even change CPU cores requested per pandas UDF? and what about memory then? We'll see how that shakes out. That's the only thing I'm kind of unsure about in this proposal. On Sat, Mar 2, 2019 at 9:35 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > > I’m very hesitant with this. > > I don’t want to vote -1, because I personally think it’s important to do, but > I’d like to see more discussion points addressed and not voting completely on > the spirit of it. > > First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe > this is a minor point and perhaps we should also vote to update the SPIP > format) > > Second, there are multiple pdf/google doc and JIRA. And I think for example > the design sketch is not covering the same points as the updated SPIP doc? It > would help to make them align before moving forward. > > Third, the proposal touches on some fairly core and sensitive components, > like the scheduler, and I think more discussions are necessary. We have a few > comments there and in the JIRA. > > > > ________________________________ > From: Marco Gaido <marcogaid...@gmail.com> > Sent: Saturday, March 2, 2019 4:18 AM > To: Weichen Xu > Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang > Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling > > +1, a critical feature for AI/DL! > > Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu > <weichen...@databricks.com> ha scritto: >> >> +1, nice feature! >> >> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li <liyinan...@gmail.com> wrote: >>> >>> +1 >>> >>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves <tgraves...@yahoo.com.invalid> >>> wrote: >>>> >>>> +1 for the SPIP. >>>> >>>> Tom >>>> >>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang >>>> <jiangxb1...@gmail.com> wrote: >>>> >>>> >>>> Hi all, >>>> >>>> I want to call for a vote of SPARK-24615. It improves Spark by making it >>>> aware of GPUs exposed by cluster managers, and hence Spark can match GPU >>>> resources with user task requests properly. The proposal and production >>>> doc was made available on dev@ to collect input. Your can also find a >>>> design sketch at SPARK-27005. >>>> >>>> The vote will be up for the next 72 hours. Please reply with your vote: >>>> >>>> +1: Yeah, let's go forward and implement the SPIP. >>>> +0: Don't really care. >>>> -1: I don't think this is a good idea because of the following technical >>>> reasons. >>>> >>>> Thank you! >>>> >>>> Xingbo