[slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

2020-02-03 Thread Dean Schulze
When I run an sbatch script with the line #SBATCH --gres=gpu:gp100:1 it runs. When I change it to #SBATCH --gres=gpu:gp100:3 it fails with "Requested node configuration is not available". But I have a node with 4 gp100s available. Here's my slurm.conf: NodeName=liqidos-dean-node1 CPUs=2 Boa

[slurm-users] Slurm version 20.02.0pre1 is now available

2020-02-03 Thread Tim Wickberg
We are pleased to announce the availability of Slurm release preview version 20.02.0pre1. This is the first preview of the upcoming 20.02 release series, and represents the end of development for the release cycle. The first release candidate - 20.02.0rc1 - is expected out next week, and will

Re: [slurm-users] SLURM starts new job before CG finishes

2020-02-03 Thread Erwin, James
Hello, Thank you for your reply Lyn. I found a temporary workaround (epilog touching a file in /tmp/ and making a prolog wait until the epilog finishes and removes the file). I was looking at CompleteWait before I tried these work-arounds but as it is written in the docs, I do not understand how

[slurm-users] Getting task distribution from environment

2020-02-03 Thread Alexander Grund
Hello, I need to get (at least) a list with the number of tasks on each hostname for the current job step from withing each task. My current approach is to use SLURM_STEP_NODELIST  and SLURM_STEP_NUM_TASKS and expand them without changing the order. Then I match them 1:1 to get something lik