Re: [OMPI users] OpenMPI / SLURM Job Issues

Jeff Pummill Wed, 27 Jun 2007 14:21:27 -0400

Hey Jeff,

Finally got my test nodes back and was looking at the info you sent. Onthe SLURM page, it states the following:

*Open MPI* <http://www.open-mpi.org/> relies upon SLURM to allocateresources for the job and then mpirun to initiate the tasks. When usingsalloc command, mpirun's -nolocal option is recommended. For example:


$ salloc -n4 sh    # allocates 4 processors and spawns shell for job

mpirun -np 4 -nolocal a.out
exit          # exits shell spawned by initial salloc command

You are saying that I need to use the slurm salloc, then pass SLURM ascript? Or could I just add it all into the script? Fro eaample:


#!/bin/sh
salloc -n4
mpirun my_mpi_application

Then, run with srun -b myscript.sh


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
Fayetteville, Arkansas 72701
(479) 575 - 4590
http://hpc.uark.edu

"A supercomputer is a device for turning compute-bound
problems into I/O-bound problems." -Seymour Cray


Jeff Squyres wrote:

Ick; I'm surprised that we don't have this info on the FAQ. I'll tryto rectify that shortly.
How are you launching your jobs through SLURM? OMPI currently doesnot support the "srun -n X my_mpi_application" model for launchingMPI jobs. You must either use the -A option to srun (i.e., get aninteractive SLURM allocation) or use the -b option (submit a scriptthat runs on the first node in the allocation). Your script can bequite short:
#!/bin/sh
mpirun my_mpi_application
Note that OMPI will automatically figure out how many cpu's are inyour SLURM allocation, so you don't need to specify "-np X". Hence,you can run the same script without modification no matter how manycpus/nodes you get from SLURM.
It's on the long-term plan to get "srun -n X my_mpi_application"model to work; it just hasn't bubbled up high enough in the prioritystack yet... :-\
On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
Just started working with OpenMPI / SLURM combo this morning. I cansuccessfully launch this job from the command line and it runs tocompletion, but when launching from SLURM they hang.
They appear to just sit with no load apparent on the compute nodeseven though SLURM indicates they are running...
[jpummil@trillion ~]$ sinfo -l
Wed Jun 20 12:32:29 2007
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPSNODES STATE NODELISTdebug* up infinite 1-infinite no no all8 allocated compute-1-[1-8]debug* up infinite 1-infinite no no all1 idle compute-1-0
[jpummil@trillion ~]$ squeue -l
Wed Jun 20 12:32:20 2007
JOBID PARTITION NAME USER STATE TIME TIMELIMITNODES NODELIST(REASON)79 debug mpirun jpummil RUNNING 5:27UNLIMITED 2 compute-1-[1-2]78 debug mpirun jpummil RUNNING 5:58UNLIMITED 2 compute-1-[3-4]77 debug mpirun jpummil RUNNING 7:00UNLIMITED 2 compute-1-[5-6]74 debug mpirun jpummil RUNNING 11:39UNLIMITED 2 compute-1-[7-8]
Are there any known issues of this nature involving OpenMPI and SLURM?

Thanks!

Jeff F. Pummill

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI / SLURM Job Issues

Reply via email to