date:20190415

[slurm-users] How to apply for multiple GPU cards from different worker nodes?

2019-04-15 Thread Ran Du

Dear all, Does anyone know how to set #SBATCH options to get multiple GPU cards from different worker nodes? One of our users would like to apply for 16 NVIDIA V100 cards for his job, and there are 8 GPU cards on each worker node, I have tried the following #SBATCH options: #SBA

Re: [slurm-users] How to apply for multiple GPU cards from different worker nodes?

2019-04-15 Thread Antony Cleave

Ask for 8 gpus on 2 nodes instead. In your script just change the 16 to 8 and it should do what you want. You are currently asking for 2 nodes with 16 gpu each as Gres resources are per node. Antony On Mon, 15 Apr 2019, 09:08 Ran Du, wrote: > Dear all, > > Does anyone know how to set #SB

Re: [slurm-users] How to apply for multiple GPU cards from different worker nodes?

2019-04-15 Thread Ran Du

Dear Antony, Thanks a lot for your reply, I tried to submit a job with your advice, and no more sbatch errors. But because our cluster is under maintenance, I have to wait till tomorrow to see if GPU cards are allocated correctly. I will let you know as soon as the job is submitted

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-15 Thread Peter Steinbach

Hi Chris, thanks for following up on this thread. First of all, you will want to use cgroups to ensure that processes that do not request GPUs cannot access them. We had a feeling that cgroups might be more optimal. Could you point us to documentation that suggests cgroups to be a requireme

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-15 Thread Christopher Samuel

On 4/15/19 8:15 AM, Peter Steinbach wrote: We had a feeling that cgroups might be more optimal. Could you point us to documentation that suggests cgroups to be a requirement? Oh it's not a requirement, just that without it there's nothing to stop a process using GPUs outside of its allocation

Re: [slurm-users] disable-bindings disables counting of gres resources

2019-04-15 Thread Peter Steinbach

Hi Chris, thanks for the detailed feedback. This is slurm 18.08.5, see also https://github.com/psteinb/docker-centos7-slurm/blob/7bdb89161febacfd2dbbcb3c5684336fb73d7608/Dockerfile#L9 Best, Peter smime.p7s Description: S/MIME Cryptographic Signature

[slurm-users] scontrol update: invalid user id

2019-04-15 Thread Pi Cluster

Hi, We are doing a senior project involving the creation of a Pi Cluster. We are using 7 Raspberry Pi B+'s in this cluster. When we use sinfo to look at the status of the nodes, they appear as drained. We also encountered a problem while trying to update the state of the nodes. When trying to u

[slurm-users] Scontrol update: invalid user id

2019-04-15 Thread Shihanjian Wang

Hi, We are doing a senior project involving the creation of a Pi Cluster. We are using 7 Raspberry Pi B+'s in this cluster. When we use sinfo to look at the status of the nodes, they appear as drained. We also encountered a problem while trying to update the state of the nodes. When trying to u

Re: [slurm-users] Scontrol update: invalid user id

2019-04-15 Thread Andy Riebs

The "invalid user id" message suggests that you need to be running as root (or possibly as the slurm user?) to update the node state. Run "slurmd -Dvv" as root on one of the compute nodes and it will show you what it thinks is the socket/core/thread configuration.

Re: [slurm-users] Scontrol update: invalid user id

2019-04-15 Thread Colas Rivière

In addition, you can check why the node were set to drain with `scontrol show node | grep Reason`. The same information should also appear in the slurm controller logs (e.g. /var/log/slurm/slurmctld.log). Colas On 2019-04-15 18:03, Andy Riebs wrote: The "invalid user id" message suggests that

Re: [slurm-users] Scontrol update: invalid user id

2019-04-15 Thread Christopher Samuel

On 4/15/19 3:03 PM, Andy Riebs wrote: Run "slurmd -Dvv" as root on one of the compute nodes and it will show you what it thinks is the socket/core/thread configuration. In fact: slurmd -C will tell you what it discovers in a way that you can use in the configuration file. All the best, Ch

[slurm-users] How to apply for multiple GPU cards from different worker nodes?

Re: [slurm-users] How to apply for multiple GPU cards from different worker nodes?

Re: [slurm-users] How to apply for multiple GPU cards from different worker nodes?

Re: [slurm-users] disable-bindings disables counting of gres resources

Re: [slurm-users] disable-bindings disables counting of gres resources

Re: [slurm-users] disable-bindings disables counting of gres resources

[slurm-users] scontrol update: invalid user id

[slurm-users] Scontrol update: invalid user id

Re: [slurm-users] Scontrol update: invalid user id

Re: [slurm-users] Scontrol update: invalid user id

Re: [slurm-users] Scontrol update: invalid user id

11 matches

Site Navigation

Mail list logo

Footer information