Re: [slurm-users] Advice on setting up fairshare

2019-06-06 Thread Loris Bennett
Hi David, I have had time to look into your current problem, but inline I have some comments about the general approach. David Baker writes: > Hello, > > Could someone please give me some advice on setting up the fairshare > in a cluster. I don't think the present setup is wildly incorrect, > h

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Christopher Samuel
On 6/6/19 12:01 PM, Kilian Cavalotti wrote: Levi did already. Aha, race condition between searching bugzilla and writing the email. ;-) -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Kilian Cavalotti
On Thu, Jun 6, 2019 at 11:16 AM Christopher Samuel wrote: > Sounds like a good reason to file a bug. Levi did already. Everybody can vote at https://bugs.schedmd.com/show_bug.cgi?id=7191 :) Cheers, -- Kilian

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Christopher Samuel
On 6/6/19 10:21 AM, Levi Morrison wrote: This means all OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. Sounds like a good reason to file a bug. We're not on 19.05 yet so we're not affected (yet) but this may cause us some pain when we get to that point (though at leas

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Levi Morrison
Slurm 19.05 removed support for `--cpu_bind`, which is what all released versions of OpenMPI are using when they call into srun. This issue was fixed 24 days ago in [OpenMPI's git repo][1]. This means all OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. This enormous amo

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Levi Morrison
Slurm 19.05 removed support for `--cpu_bind`, which is what /all/ released versions of OpenMPI are using when they call into srun. This issue was fixed 24 days ago in [OpenMPI's git repo][1]. This means /all/ OpenMPI programs that end up calling `srun` on Slurm 19.05 will fail. This enormous

[slurm-users] Advice on setting up fairshare

2019-06-06 Thread David Baker
Hello, Could someone please give me some advice on setting up the fairshare in a cluster. I don't think the present setup is wildly incorrect, however either my understanding of the setup is wrong or something is reconfigured. When we set a new user up on the cluster and they haven't used any

[slurm-users] switch to cgroups

2019-06-06 Thread Crocker, Deborah
Anyone know what would happen to running jobs if we switch to cgroups? We missed getting this set when we had a general cluster shutdown and want to get it set but do have running jobs at the moment. Thanks Deborah Crocker, PhD Systems Engineer III Office of Information Technology The University

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Hello, We have tried to compile it in 2 ways, in principle we had compiled it with pmix in the following way: rpmbuild -ta slurm-19.05.0.tar.bz2 --define = '_ with_pmix --with-pmix = / opt / pmix / 3.1.2 /' But we have also tried compiling it without pmix: rpmbuild -ta slurm-19.05.0.tar.bz2

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Sean Crosby
How did you compile SLURM? Did you add the contribs/pmi and/or contribs/pmi2 plugins to the install? Or did you use PMIx? Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Thu,

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Hello, Yes, we have recompiled OpenMPI with integration with SLURM 19.05 but the problem remains. We have also tried to recompile OpenMPI without integration with SLURM. In this case executions fail with srun, but with mpirun it continues to work in SLURM 18.08 and fails in 19.05 in the same

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Sean Crosby
Hi Andrés, Did you recompile OpenMPI after updating to SLURM 19.05? Sean -- Sean Crosby Senior DevOpsHPC Engineer and HPC Team Lead | Research Platform Services Research Computing | CoEPP | School of Physics University of Melbourne On Thu, 6 Jun 2019 at 20:11, Andrés Marín Díaz mailto:ama...@

[slurm-users] anybody using udocker?

2019-06-06 Thread Miguel Gutiérrez Páez
Hi all, I have installed udocker in our hcp infrastructure. I've been testing and I found a very strange behaviour regarding memory comsumption. If a launch a job with a memory reservation equal or higher to 4GB, any udocker container can bypass this limit and use all the memory available in the n

[slurm-users] Removing WCKey

2019-06-06 Thread Sam Gallop (NBI)
Hi, I'm having problems trying to remove a wckey associated with a user account. According to the documentation it should be simply a case of 'sacctmgr del user wckey=' but when I try it doesn't seem to like it. An example for a user called user1 ... # sacctmgr add user user1 wckey=test1 WCKe

Re: [slurm-users] Failed to launch jobs with mpirun after upgrading to Slurm 19.05

2019-06-06 Thread Andrés Marín Díaz
Thank you very much for the help, I update some information. - If we use Intel MPI (IMPI) mpirun it works correctly. - If we use mpirun without using the scheduler it works correctly. - If we use srun with software compiled with OpenMPI it works correctly. - If we use SLURM 18.08.6 it works corre