Re: [OMPI users] OpenMPI / SLURM Job Issues

Tim Prins Wed, 27 Jun 2007 14:40:28 -0400

Hi Jeff,

If you submit a batch script, there is no need to do a salloc.


See the Open MPI FAQ for details on how to run on SLURM:
http://www.open-mpi.org/faq/?category=slurm

Hope this helps.

Tim

On Wednesday 27 June 2007 14:21, Jeff Pummill wrote:
> Hey Jeff,
>
> Finally got my test nodes back and was looking at the info you sent. On
> the SLURM page, it states the following:
>
> *Open MPI* <http://www.open-mpi.org/> relies upon SLURM to allocate
> resources for the job and then mpirun to initiate the tasks. When using
> salloc command, mpirun's -nolocal option is recommended. For example:
>
> $ salloc -n4 sh    # allocates 4 processors and spawns shell for job
>
> > mpirun -np 4 -nolocal a.out
> > exit          # exits shell spawned by initial salloc command
>
> You are saying that I need to use the slurm salloc, then pass SLURM a
> script? Or could I just add it all into the script? Fro eaample:
>
> #!/bin/sh
> salloc -n4
> mpirun my_mpi_application
>
> Then, run with srun -b myscript.sh
>
>
> Jeff F. Pummill
> Senior Linux Cluster Administrator
> University of Arkansas
> Fayetteville, Arkansas 72701
> (479) 575 - 4590
> http://hpc.uark.edu
>
> "A supercomputer is a device for turning compute-bound
> problems into I/O-bound problems." -Seymour Cray
>
> Jeff Squyres wrote:
> > Ick; I'm surprised that we don't have this info on the FAQ.  I'll try
> > to rectify that shortly.
> >
> > How are you launching your jobs through SLURM?  OMPI currently does
> > not support the "srun -n X my_mpi_application" model for launching
> > MPI jobs.  You must either use the -A option to srun (i.e., get an
> > interactive SLURM allocation) or use the -b option (submit a script
> > that runs on the first node in the allocation).  Your script can be
> > quite short:
> >
> > #!/bin/sh
> > mpirun my_mpi_application
> >
> > Note that OMPI will automatically figure out how many cpu's are in
> > your SLURM allocation, so you don't need to specify "-np X".  Hence,
> > you can run the same script without modification no matter how many
> > cpus/nodes you get from SLURM.
> >
> > It's on the long-term plan to get "srun -n X my_mpi_application"
> > model to work; it just hasn't bubbled up high enough in the priority
> > stack yet... :-\
> >
> > On Jun 20, 2007, at 1:59 PM, Jeff Pummill wrote:
> >> Just started working with OpenMPI / SLURM combo this morning. I can
> >> successfully launch this job from the command line and it runs to
> >> completion, but when launching from SLURM they hang.
> >>
> >> They appear to just sit with no load apparent on the compute nodes
> >> even though SLURM indicates they are running...
> >>
> >> [jpummil@trillion ~]$ sinfo -l
> >> Wed Jun 20 12:32:29 2007
> >> PARTITION AVAIL  TIMELIMIT   JOB_SIZE ROOT SHARE     GROUPS
> >> NODES       STATE NODELIST
> >> debug*       up   infinite 1-infinite   no    no        all
> >> 8   allocated compute-1-[1-8]
> >> debug*       up   infinite 1-infinite   no    no        all
> >> 1        idle compute-1-0
> >>
> >> [jpummil@trillion ~]$ squeue -l
> >> Wed Jun 20 12:32:20 2007
> >>   JOBID PARTITION     NAME     USER    STATE       TIME TIMELIMIT
> >> NODES NODELIST(REASON)
> >>      79     debug   mpirun  jpummil  RUNNING       5:27
> >> UNLIMITED      2 compute-1-[1-2]
> >>      78     debug   mpirun  jpummil  RUNNING       5:58
> >> UNLIMITED      2 compute-1-[3-4]
> >>      77     debug   mpirun  jpummil  RUNNING       7:00
> >> UNLIMITED      2 compute-1-[5-6]
> >>      74     debug   mpirun  jpummil  RUNNING      11:39
> >> UNLIMITED      2 compute-1-[7-8]
> >>
> >> Are there any known issues of this nature involving OpenMPI and SLURM?
> >>
> >> Thanks!
> >>
> >> Jeff F. Pummill
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OpenMPI / SLURM Job Issues

Reply via email to