There is a early thread related to this: https://groups.google.com/forum/#!searchin/slurm-devel/gres$20gpu$20oversubscribe%7Csort:date/slurm-devel/WPmkNPedKeM/r7EDvX7jujgJ
On Sat, Oct 21, 2017 at 10:58 PM, Chaofeng Zhang <zhang...@lenovo.com> wrote: > CUDA support it, gpu is shared mode by default, we can have more than one > process running on it. > > > > *From:* Doug Meyer [mailto:dameye...@gmail.com] > *Sent:* Saturday, October 21, 2017 9:50 PM > *To:* slurm-dev <slurm-dev@schedmd.com> > *Subject:* [slurm-dev] Re: How can I run multi job on one gpu > > > > Hi, > > I believe you have a CUDA challenge first. Can you run multiple GPU jobs > from the command line without slurm? GPU sharing between multiple > independent tasks has been tough. > > Thank you, > > Doug > > > > On Fri, Oct 20, 2017 at 12:34 AM, Chaofeng Zhang <zhang...@lenovo.com> > wrote: > > *First, the gpu is already set shared mode.* > > > > *I can run job using gpu with the following slurm configuration, I have > one job using 1 gpu, I can see CUDA_VISIBLE_DEVICE in the job env. If I > want to run another job using the 1 gpus, the job will be pending. How to > configure so that I can run multi job on the same gpus?* > > *I noticed :no_consume can be added to the Gres, at this time, I can run > multi jobs, but there is no CUDA_VISIBLE_DEVICE can be found in the job > env.* > > *Slurm.conf* > > *NodeName=node1 Gres=gpu:1 CPUs=4 State=UNKNOWN* > > > > Thanks. > > > > Jeff (ChaoFeng Zhang, 张超锋) PMP® zhang...@lenovo.com > > HPC&AI | Cloud Software Architect (+86) - > 18116117420 <+86%20181%201611%207420> > > Software solution development (+8621) - 20590223 > <+86%2021%202059%200223> > > Shanghai, China > > > > >