from:"Stephan Roth"

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

2024-06-24 Thread Stephan Roth via slurm-users

Dear Xaver, Could you clarify the function of what you call "master"? If it's the Slurm controller, i.e. running slurmctld: Why do you need slurmd running on it as well? Best, Stephan On 24.06.24 13:54, Xaver Stiensmeier via slurm-users wrote: Dear Slurm users, in our project we exclude th

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-13 Thread Stephan Roth

Markus, thanks for the heads-up. I intend to either reserve specific nodes with GPUs or use features. Best, Stephan On 13.09.23 09:08, Markus Kötter wrote: Hi, currently reservations do not work for gres. https://bugs.schedmd.com/show_bug.cgi?id=5771 23.11 might change this. MfG

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Stephan Roth

Thanks Chris, this completes what I was looking for. Should have had a better look at the scontrol man page. Best, Stephan On 13.09.23 02:24, Chris Samuel wrote: On 12/9/23 9:22 am, Stephan Roth wrote: Thanks Noam, this looks promising! I would suggest that as was as the "magnetic&

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Stephan Roth

that they specify the name of the reservation. The reservation will only "attract" jobs that meet the access control requirements. (from https://slurm.schedmd.com/reservations.html <https://slurm.schedmd.com/reservations.html>) On Sep 12, 2023, at 10:14 AM, Step

[slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

2023-09-12 Thread Stephan Roth

Dear Slurm users, I'm looking to fulfill the requirement of guaranteeing availability of GPU resources to a Slurm account, while allowing this account to use other available GPU resources as well. The guaranteed GPU resources should be of at least 1 type, optionally up to 3 types, as in: Gr

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-30 Thread Stephan Roth

ware of any job crashes. Your mileage may vary depending on job types! Question: Does anyone have bad experiences with upgrading slurmd while the cluster is running production? /Ole -- ETH Zurich Stephan Roth Systems Administrator IT Support Group (ISG) D-ITET ETF D 104 Sternwartstrasse 7 8092

Re: [slurm-users] Rolling upgrade of compute nodes

2022-05-29 Thread Stephan Roth

Hi Byron, If you have the means to set up a test environment to try the upgrade first, I recommend to do it. The upgrade from 19.05 to 20.11 worked for two clusters I maintain with a similar NFS based setup, except we keep the Slurm configuration separated from the Slurm software accessible

Re: [slurm-users] container on slurm cluster

2022-05-17 Thread Stephan Roth

On 17.05.22 17:17, Timo Rothenpieler wrote: On 17.05.2022 15:58, Brian Andrus wrote: You are starting to understand a major issue with most containers. I suggest you check out Singularity, which was built from the ground up to address most issues. And it can run other container types (eg: do

Re: [slurm-users] FW: gres/gpu count lower than reported

2022-05-03 Thread Stephan Roth

recipient, please contact the sender by return electronic mail and delete all copies of this communication -- ETH Zurich Stephan Roth Systems Administrator IT Support Group (ISG) D-ITET ETF D 104 Sternwartstrasse 7 8092 Zurich Phone +41 44 632 30 59 stephan.r...@ee.ethz.ch www.isg.ee.eth

Re: [slurm-users] MPICH

2022-04-28 Thread Stephan Roth

Hi Diego, I don't know about MPICH, but in case you haven't done this already, you might check the Slurm side if everything is ready: Did you make sure your Slurm was built with PMI support (as in `configure ... --with-pmix=/path/to/pmix`)? Do you see MPI types: srun --mpi=list Does a tes

Re: [slurm-users] How to tell SLURM to ignore specific GPUs

2022-02-02 Thread Stephan Roth

On 02.02.22 18:32, Michael Di Domenico wrote: On Mon, Jan 31, 2022 at 3:57 PM Stephan Roth wrote: The problem is to identify the cards physically from the information we have, like what's reported with nvidia-smi or available in /proc/driver/nvidia/gpus/*/information The serial number

Re: [slurm-users] How to tell SLURM to ignore specific GPUs

2022-01-31 Thread Stephan Roth

Not a solution, but some ideas & experiences concerning the same topic: A few of our older GPUs used to show the error message "has fallen off the bus" which was only resolved by a full power cycle as well. Something changed, nowadays the error messages is "GPU lost" and a normal reboot reso

Re: [slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?

2022-01-06 Thread Stephan Roth

, we want to use EGL backend for accessing OpenGL without the need for Xorg. This approach requires access to devices /dev/dri/card* and /dev/dri/renderD* . Is there a way to give access to these devices along with /dev/nvidia* which we use for CUDA? Ideally as a single generic resource that would g

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Stephan Roth

On 03.06.21 07:11, Ahmad Khalifa wrote: How to send a job to a particular gpu card using its ID (0,1,2...etc)? Why do you need to access a GPU based on its ID? If its to select a certain GPU type, there are other methods you can use. You could create partitions for the same GPU types or add f

Re: [slurm-users] AutoDetect=nvml throwing an error message

2021-04-16 Thread Stephan Roth

;> TaskPlugin=task/cgroup >> ProctrackType=proctrack/cgroup >> >> ## Nodes list >> ## use native GPUs >> NodeName=nodeGPU01 SocketsPerBoard=8 CoresPerSocket=16 ThreadsPerCore=2 RealMemory=1024000 State=UNKNOWN Gres=gpu:8 Feature=

Re: [slurm-users] changes in slurm.

2020-07-10 Thread Stephan Roth

tition and will resume the job. I > am not deleting the partition here. > > Regards > Navin. > > > > > > > --- Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch +4144 632 30 59 | ETF D 104 | Sternwartstrasse 7 | 8092 Zurich ---

[slurm-users] Automatically cancel jobs not utilizing their GPUs

2020-07-02 Thread Stephan Roth

Hi all, Does anyone have ideas or suggestions on how to automatically cancel jobs which don't utilize the GPUs allocated to them? The Slurm version in use is 19.05. I'm thinking about collecting GPU utilization per process on all nodes with NVML/nvidia-smi, update a mean value of the collect

Re: [slurm-users] How to view GPU indices of the completed jobs?

2020-06-26 Thread Stephan Roth

In regard to Kota's initial question ... "Is there any way (commands, configurations, etc...) to see the allocated GPU indices for completed jobs?" ... I was in need of the same kind of information and found the following: If - ConstrainDevices is on - SlurmdDebug is set to at least "debug"

[slurm-users] How to detect Job submission by srun / interactive jobs

2020-05-18 Thread Stephan Roth

Dear all, Does anybody know of a way to detect whether a job is submitted with srun, preferrably in job_submit.lua? The goal is to allow interactive jobs only on specific partitions. Any recommendation or best practice on how to handle interactive jobs is welcome. Thank you, Stephan

Re: [slurm-users] Reading which GPUs were assigned to which job

2020-04-23 Thread Stephan Roth

Best, Stephan ------- Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch +4144 632 30 59 | ETF D 104 | Sternwartstrasse 7 | 8092 Zurich --- On 23.04.20 1

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth

g in it that has nv or cuda in its name. Are you sure that slurm distributes nvidia binaries? -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] How to use Autodetect=nvml in

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth

om wrote: I just checked the .deb package that I build from source and there is nothing in it that has nv or cuda in its name. Are you sure that slurm distributes nvidia binaries? -Original Message- From: slurm-users On Behalf Of Stephan Roth Sent: Friday, February 7, 2020 2:23 AM T

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

2020-02-07 Thread Stephan Roth

On 05.02.20 21:06, Dean Schulze wrote: > I need to dynamically configure gpus on my nodes. The gres.conf doc > says to use > > Autodetect=nvml That's all you need in gres.conf provided you don't configure any Gres=... entries for your nodes in your slurm.conf. If you do, make sure the string ma

[slurm-users] Re: How to exclude master from computing? Set to DRAINED?

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

Re: [slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

[slurm-users] Guarantee minimum amount of GPU resources to a Slurm account

Re: [slurm-users] Rolling upgrade of compute nodes

Re: [slurm-users] Rolling upgrade of compute nodes

Re: [slurm-users] container on slurm cluster

Re: [slurm-users] FW: gres/gpu count lower than reported

Re: [slurm-users] MPICH

Re: [slurm-users] How to tell SLURM to ignore specific GPUs

Re: [slurm-users] How to tell SLURM to ignore specific GPUs

Re: [slurm-users] Use gres to handle permissions of /dev/dri/card* and /dev/dri/renderD*?

Re: [slurm-users] Specify a gpu ID

Re: [slurm-users] AutoDetect=nvml throwing an error message

Re: [slurm-users] changes in slurm.

[slurm-users] Automatically cancel jobs not utilizing their GPUs

Re: [slurm-users] How to view GPU indices of the completed jobs?

[slurm-users] How to detect Job submission by srun / interactive jobs

Re: [slurm-users] Reading which GPUs were assigned to which job

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

Re: [slurm-users] How to use Autodetect=nvml in gres.conf

23 matches

Site Navigation

Mail list logo

Footer information