Wow. I did not catch that version issue. I saw that there were issues with the newest Slurm and how CUDA 10+ installs so I avoided that even though we have CUDA 8. I did have Slurm 19 downloaded so I'm thinking I ran into an issue with that and went back to 18 but now that I have more experience setting it up I'll wipe the 18 install and start over. Fingers crossed for success!
Thanks for your help! -- Lisa Weihl Systems Administrator, Computer Science Bowling Green State University Tel: (419) 372-0116 | Fax: (419) 372-8061 lwe...@bgsu.edu www.bgsu.edu -----Original Message----- From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of slurm-users-requ...@lists.schedmd.com Sent: Thursday, April 16, 2020 6:39 PM To: slurm-users@lists.schedmd.com Subject: [EXTERNAL] slurm-users Digest, Vol 30, Issue 32 Send slurm-users mailing list submissions to slurm-users@lists.schedmd.com To subscribe or unsubscribe via the World Wide Web, visit https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D782Wwobcc6ezSuy5GipiXuiH7EKRMm5Llk3BRwYnss%3D&reserved=0 or, via email, send a message with subject or body 'help' to slurm-users-requ...@lists.schedmd.com You can reach the person managing the list at slurm-users-ow...@lists.schedmd.com When replying, please edit your Subject line so it is more specific than "Re: Contents of slurm-users digest..." Today's Topics: 1. CentOS 7 CUDA 8.0 can't find plugin cons_tres (Lisa Kay Weihl) 2. Re: [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres (Sean Crosby) ---------------------------------------------------------------------- Message: 1 Date: Thu, 16 Apr 2020 19:00:03 +0000 From: Lisa Kay Weihl <lwe...@bgsu.edu> To: "slurm-users@lists.schedmd.com" <slurm-users@lists.schedmd.com> Subject: [slurm-users] CentOS 7 CUDA 8.0 can't find plugin cons_tres Message-ID: <dm5pr05mb29056be0862db04aa8960355b0...@dm5pr05mb2905.namprd05.prod.outlook.com> Content-Type: text/plain; charset="utf-8" I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is to serve as a computer server for data science jobs. My department chair wants a job scheduler on it. I have installed SLURM (18.08.9). That works just fine in a basic configuration when I attempt to add Gres_Types gpu and then add Gres:gpu:4 to the end of the node description: NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4 and then try to restart slurmd I get an error that it cannot find the plugin slurmd: error: Couldn't find the specified plugin name for select/cons_tres looking at all files slurmd: error: cannot find select plugin for select/cons_tres slurmd: fatal: Can't find plugin for select/cons_tres The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0 I usually keep notes when I'm installing things but in this case I wasn't jotting things down as I went. I think I started with the instructions on this page: https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 and went with the usual ./configure, make, make install. I have a feeling maybe something did not work and I switched to the rpm packages based on some other web pages I saw because if I do a yum list installed | grep slurm I see a lot of pacakages. The problem is I was interrupted with other tasks and my memory was somewhat rusty when I came back to this. When I went looking for this error I saw there were some issues with the newest SLURM and CUDA 10.2 but I didn't think that should be an issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18. I'm willing to start all over if anyone thinks cleaning up and rebuilding will help that. I do see libraries in /etc/lib64/slurm but I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if that's left over from trying to install from source. All the daemons are in /usr/sbin and user commands in /usr/bin I'm a newbie at this and very frustrated. Can anyone help? *************************************************************** Lisa Weihl Systems Administrator Computer Science, Bowling Green State University Tel: (419) 372-0116 | Fax: (419) 372-8061 lwe...@bgsu.edu http://www.bgsu.edu/? -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200416%2F450a069d%2Fattachment-0001.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=D8CwIzZ2C0lYQQn%2BEtFE4%2FHgSVdStiSjO2%2F0tZ3snHk%3D&reserved=0> ------------------------------ Message: 2 Date: Fri, 17 Apr 2020 08:38:27 +1000 From: Sean Crosby <scro...@unimelb.edu.au> To: Slurm User Community List <slurm-users@lists.schedmd.com> Subject: Re: [slurm-users] [EXTERNAL] CentOS 7 CUDA 8.0 can't find plugin cons_tres Message-ID: <CAFstPEBO5+MthqskkP8dbo6Vvy8=f8yrczbxanwzmz1qdx3...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi Lisa, cons_tres is part of Slurm 19.05 and higher. As you are using Slurm 18.08, it won't be there. The select plugin for 18.05 is cons_res. Is there a reason why you're using an old Slurm? Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Fri, 17 Apr 2020 at 05:00, Lisa Kay Weihl <lwe...@bgsu.edu> wrote: > *UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts.* > ------------------------------ > I have a standalone server with 4 GeForce RTX 2080 Ti. The purpose is > to serve as a computer server for data science jobs. My department > chair wants a job scheduler on it. I have installed SLURM (18.08.9). > That works just fine in a basic configuration when I attempt to add > Gres_Types gpu and then add Gres:gpu:4 to the end of the node description: > > NodeName=cs-datasci CPUs=24 RealMemory=385405 Sockets=2 > CoresPerSocket=6 > ThreadsPerCore=2 State=UNKNOWN Gres=gpu:4 > > and then try to restart slurmd I get an error that it cannot find the > plugin > > slurmd: error: Couldn't find the specified plugin name for > select/cons_tres looking at all files > > slurmd: error: cannot find select plugin for select/cons_tres > > slurmd: fatal: Can't find plugin for select/cons_tres > > The system was prebuilt by AdvancedHPC with CentOS 7 and CUDA 8.0 > > I usually keep notes when I'm installing things but in this case I > wasn't jotting things down as I went. I think I started with the > instructions on this page: > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fquickstart_admin.html&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=0%2BjmfxFqNhRQBC50zbeG5g5EO6pi2n5We9vPt6WGyHs%3D&reserved=0 > and went with the usual ./configure, make, make install. > > I have a feeling maybe something did not work and I switched to the > rpm packages based on some other web pages I saw because if I do a yum > list installed | grep slurm I see a lot of pacakages. The problem is I > was interrupted with other tasks and my memory was somewhat rusty when > I came back to this. > > When I went looking for this error I saw there were some issues with > the newest SLURM and CUDA 10.2 but I didn't think that should be an > issue because I was at CUDA 8.0. Just in case I backed down to SLURM 18. > > I'm willing to start all over if anyone thinks cleaning up and > rebuilding will help that. I do see libraries in /etc/lib64/slurm but > I also see 2 files in /usr/local/lib/slurm/src so I'm not sure if > that's left over from trying to install from source. All the daemons > are in /usr/sbin and user commands in /usr/bin > > I'm a newbie at this and very frustrated. Can anyone help? > > *************************************************************** > > Lisa Weihl *Systems Administrator* > > > *Computer Science, Bowling Green State University *Tel: (419) 372-0116 > | Fax: (419) 372-8061 > lwe...@bgsu.edu > http://www.bgsu.edu/? > -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.schedmd.com%2Fpipermail%2Fslurm-users%2Fattachments%2F20200417%2Facda81ed%2Fattachment.htm&data=02%7C01%7Clweihl%40bgsu.edu%7C51ded050bd424dc6ba8908d7e256fdad%7Ccdcb729d51064d7cb75ba30c455d5b0a%7C1%7C0%7C637226735569993045&sdata=KuHeR2ewb8Qx68c3bB3H8RSQwEPiyVvNGjpYUmdvRrg%3D&reserved=0> End of slurm-users Digest, Vol 30, Issue 32 *******************************************