[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

2017-10-25 Thread Holger Naundorf
On 10/24/2017 09:30 PM, Sean Caron wrote: (...) > > Here's some config relating to the scheduler from our slurm.conf that > might be germane: > - > > SchedulerType=sched/backfil

[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

2017-10-25 Thread Ole Holm Nielsen
On 10/25/2017 01:52 PM, Holger Naundorf wrote: I'd really appreciate any help the SLURM wizards can provide! We suspect it's something to do with how we've set up QoS or maybe, we need to tweak the scheduler configuration in 17.02.8 however there's no single clear path forward. Just let me know

[slurm-dev] Calendar in SLURM

2017-10-25 Thread sysadmin.caos
Hello, I would like to apply a calendar in SLURM, something the SGE calendar for enabling and disabling queues automatically. I know I could create some "cron" jobs, but I'm looking for something "smart". Thanks.

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread Le Biot, Pierre-Marie
Hi Sebastian, Another solution could be to change the configuration of nodes in slurm.conf, making use of NodeName and NodeHostname (and NodeAddr if needed) : “ NodeName Name that Slurm uses to refer to a node[...]. Typically this would be the string that "/bin/hostname -s" returns.[...]It may

[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

2017-10-25 Thread Patrick Goetz
On 10/25/2017 06:58 AM, Ole Holm Nielsen wrote: I agree that the backfill scheduler requires configuration beyond the default settings!  This surprised me as well.  I wrote some notes in my Wiki which could be used as a starting point: https://wiki.fysik.dtu.dk/niflheim/Slurm_scheduler#backf

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread John Hearns
When using “mpirun” we can specify “-iface ib0”this is true, and the exact syntax depends on your MPI of choice, as noted above. However, don't get confused between IPOIB and Infiniband itself. IPOIB is of course sending IP traffic over Infiniband. An Infiniband network can perfectly happil

[slurm-dev] Re: Selecting a network interface with srun

2017-10-25 Thread r...@open-mpi.org
Good points. I would also caution against renaming nodes using interfaces. This frequently causes failure of 3rd party software packages that compare the return value of “hostname” to the list of allocated nodes for optimization or placement purposes - e.g., mpirun! A quick grep of the mailing l

[slurm-dev] Re: Selecting a network interface with srun

2017-10-25 Thread John Hearns
Ralph, indeed. As I have said before: Finally, my one piece of advice to everyone managing batch systems. It is a name resolution problem. No, really it is. Even if your cluster catches fire, the real reason that your jobs are not being submitted is that the DNS resolver is burning and the sch

[slurm-dev] RE: Selecting a network interface with srun

2017-10-25 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I think this is probably the best advice. I've personally never run into a set of circumstances where MPI didn't pick the right interface where there wasn't a major misconfiguration (you'd of course want to test that on your system), but users might w

[slurm-dev] Re: SLURM 17.02.8 not optimally scheduling jobs/utilizing resources

2017-10-25 Thread Sean Caron
Thanks for the suggestions, everyone. I will certainly be looking at SchedulerParameters to see if I can optimize that a little further with contemporary guidance on best practices. It looks like what got us going well enough, for now, was just to bump the Priority associated with each QoS class t

[slurm-dev] Naming of output & error files

2017-10-25 Thread Lachlan Musicman
Hi All, I've now been asked twice in two days if there is any way to intelligently name slurm output files. Sometimes our users will do something like for fasta_file in `ls /path/to/fasta_files`; do sbatch myscript.sbatch $fasta_file; done They would like their output and error files to be like

[slurm-dev] Re: Naming of output & error files

2017-10-25 Thread Raymond Wan
Hi Lachlan, On Thu, Oct 26, 2017 at 10:10 AM, Lachlan Musicman wrote: > for fasta_file in `ls /path/to/fasta_files`; do sbatch myscript.sbatch > $fasta_file; done > > They would like their output and error files to be like: > > --output=$fasta_file.out > --error=$fasta_file.err I analyze a lo

[slurm-dev] Re: Naming of output & error files

2017-10-25 Thread Alex Chekholko
Why can't you just do for fasta_file in `ls /path/to/fasta_files`; do sbatch --output=$fasta_file.out --error=$fasta_file.err myscript.sbatch $fasta_file; done On Wed, Oct 25, 2017 at 7:08 PM, Lachlan Musicman wrote: > Hi All, > > I've now been asked twice in two days if there is any way to int

[slurm-dev] unsubscribe

2017-10-25 Thread dhvanika.s...@wipro.com
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or c

[slurm-dev] CPU/GPU Affinity Not Working

2017-10-25 Thread Dave Sizer
Hi, We are running slurm 17.02.7 For some reason, we are observing that the preferred CPUs defined in gres.conf for GPU devices are being ignored when running jobs. That is, in our gres.conf we have gpu resource lines, such as: Name=gpu Type=kepler File=/dev/nvidia0 CPUs=0,1,2,3,4,5,6,7,16,1

[slurm-dev] Limiting licenses per-user

2017-10-25 Thread Johan Brannlund
Hello. I have a remote license configured in slurm and I'd like to limit the number of licenses each user can use. The license shows up correctly in both "scontrol show lic", "sacctmgr show tres" and "sacctmgr show resource", but attempting to use QoS to set a limit on the number of license

[slurm-dev] Re: Naming of output & error files

2017-10-25 Thread Lachlan Musicman
On 26 October 2017 at 13:27, Alex Chekholko wrote: > Why can't you just do > > for fasta_file in `ls /path/to/fasta_files`; do sbatch > --output=$fasta_file.out --error=$fasta_file.err myscript.sbatch > $fasta_file; done > Because it was staring me in the face and I ignored it. Thank you. che