[slurm-dev] Re: Qos limits associations and AD auth

2017-10-20 Thread Chris Samuel
On Friday, 20 October 2017 9:53:06 AM AEDT Lachlan Musicman wrote: > Latest version of sssd can take shortnames and search through domains. I'm not sure if that works though if you've got two different people with the same username in different domains though. cheers, Chris -- Christopher Sa

[slurm-dev] How can I run multi job on one gpu

2017-10-20 Thread Chaofeng Zhang
First, the gpu is already set shared mode. I can run job using gpu with the following slurm configuration, I have one job using 1 gpu, I can see CUDA_VISIBLE_DEVICE in the job env. If I want to run another job using the 1 gpus, the job will be pending. How to configure so that I can run multi

[slurm-dev] AW: Questions about resource requests

2017-10-20 Thread Steininger, Herbert
Are there any news in these Thread that i might have missed? Best. Von: zhangtao102019 [mailto:zhangtao102...@126.com] Gesendet: Mittwoch, 11. Oktober 2017 13:22 An: slurm-dev Betreff: [slurm-dev] Questions about resource requests Hello, I am installing the slurm-17.02.6 on my testcluster, and

[slurm-dev] Re: node selection

2017-10-20 Thread Michael Di Domenico
On Thu, Oct 19, 2017 at 3:14 AM, Steffen Grunewald wrote: >> for some reason on an empty cluster when i spin up a large job it's >> staggering the allocation across a seemingly random allocation of >> nodes > > Have you looked into topology? With topology.conf, you may group nodes > by (virtually

[slurm-dev] slurm database purge,

2017-10-20 Thread Véronique LEGRAND
Hello, For 2 month now we have been finding the slurmdbd daemon down on every 1rst of the month. Error messages in the logs appear shortly after midnight. They say: 2017-10-01T00:02:42.468823+02:00 tars-acct slurmdbd[7762]: error: mysql_query failed: 1205 Lock wait timeout exceeded; try restart

[slurm-dev] OverSubscribe can just be used for the resource cpu, whether it can be used for gpu

2017-10-20 Thread Chaofeng Zhang
Below is worked for cpu, with OverSubscribe, I can have more than 4 process in running status, but if I add #SBATCH --gres=gpu:2 in the job file, there will be just 1 process in running status, the other are in pending status. The OverSubscribe can just be used for the resource cpu, whether it c

[slurm-dev] Re: slurm database purge,

2017-10-20 Thread Lyn Gerner
Hi Veronique, You understand correctly. Try 365days instead of 12months, and it will cause a single-day purge every night. Regards, Lyn On Fri, Oct 20, 2017 at 5:25 AM, Véronique LEGRAND < veronique.legr...@pasteur.fr> wrote: > Hello, > > > > > > For 2 month now we have been finding the slurmdb