Hi Joshua,

On Aug 21, 2014, at 12:28 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
> When launching with mpirun in a SLURM environment, srun is only being used to 
> launch the ORTE daemons (orteds.)  Since the daemon will already exist on the 
> node from which you invoked mpirun, this node will not be included in the 
> list of nodes. SLURM's PMI library is not involved (that functionality is 
> only necessary if you directly launch your MPI application with srun, in 
> which case it is used to exchanged wireup info amongst slurmds.) This is the 
> expected behavior. 
> 
> ~/ompi-top-level/orte/mca/plm/plm_slurm_module.c +294
> /* if the daemon already exists on this node, then
>          * don't include it
>          */
>         if (node->daemon_launched) {
>             continue;
>         }
> 
> Do you have a frontend node that you can launch from? What happens if you set 
> "-np X" where X = 8*ppn. The alternative is to do a direct launch of the MPI 
> application with srun.

I understand the logic and I understand with orted in the first node is not 
needed. But since we use a batch system (SLURM) we do not want people to run 
their mpirun commands directly fon a front-end. Typical scenario: all compute 
node are running fine but we reboot all the login nodes to upgrade the linux 
image because of a security update the kernel. We can keep the login nodes 
offline potentially for hours without stop the system to work. 

From our perspective, a front-end node is an additional burden. Of course login 
node and front-end node can be two separated hosts but I am looking for a way 
to keep our setup as-it-is without introducing structural changes. 


Hi Ralph,

On Aug 21, 2014, at 12:36 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Or you can add 
> 
>    -nolocal|--nolocal    Do not run any MPI applications on the local node
> 
> to your mpirun command line and we won't run any application procs on the 
> node where mpirun is executing

I tried but of course but mpirun complains. If it cannot run local (meaning on 
the first node, tesla121) then only 7 nodes remains and I request in total 8. 
So to use "--nolocal" I need to add another nodes. Since we allocate node 
exclusively and for some users we charge the usage real money... this is not 
ideal I am afraid.


srun seems the only solution to go. I need to understand how to pass most of 
the --mca parameters to srun and to be sure I can pilot  rmaps_lama_* options 
as flexible as I do with normal mpirun. Then there are mxm, fca, hcoll....I am 
not against srun in principle, my only stopping point it that the syntax is 
only different that we might receive lot (too many) complains our users in 
adopting this new way to submit because they are used to use classic mpirun 
inside a sbatch script. Most of them will probably not switch to a different 
method! So our hope to "silently" profile network, energy, I/O using SLURM 
plugins also using Open MPI is lost...

F

--
Mr. Filippo SPIGA, M.Sc.
http://filippospiga.info ~ skype: filippo.spiga

«Nobody will drive us out of Cantor's paradise.» ~ David Hilbert

*****
Disclaimer: "Please note this message and any attachments are CONFIDENTIAL and 
may be privileged or otherwise protected from disclosure. The contents are not 
to be disclosed to anyone other than the addressee. Unauthorized recipients are 
requested to preserve this confidentiality and to advise the sender immediately 
of any error in transmission."


Reply via email to