Nicolas,

This looks odd at first glance, but as stated before, 1.6 is an obsolete
series.
A workaround could be to
mpirun—mca ess ...
And replace ... with a comma separated list of ess components that excludes
both slurm and slurmd.

An other workaround could be to remove SLURM related environment variables
before calling mpirun.


Cheers,

Gilles

On Thursday, May 17, 2018, Nicolas Deladerriere <
nicolas.deladerri...@gmail.com> wrote:

> Hi all,
>
> Thanks for your feedback,
>
> about using " mpirun --mca ras ^slurm --mca plm ^slurm --mca ess
> ^slurm,slurmd ...". I am a bit confused since syntax sounds good, but I
> keep getting following error at run time :
>
>
>
>
>
>
>
>
>
>
>
>
>
> *--------------------------------------------------------------------------MCA
> framework parameters can only take a single negation operator("^"), and it
> must be at the beginning of the value.  The followingvalue violates this
> rule:    env,^slurm,slurmdWhen used, the negation operator sets the
> "exclusive" behavior mode,meaning that it will exclude all specified
> components (and implicitlyinclude all others). ......You cannot mix
> inclusive and exclusive behavior.*
>
> Is there other mca setting that could violate command line setting ?
>
> here is full mpirun command line :
>
> */../openmpi/1.6.5/bin/mpirun -prefix /.../openmpi/1.6.5 -tag-output -H
> r01n05 -x OMP_NUM_THREADS -np 1 --mca ras ^slurm --mca plm ^slurm --mca ess
> ^slurm,slurmd  master_exe.x: -H r01n06,r01n07 -x OMP_NUM_THREADS -np 2
> slave_exe.x*
>
> and ompi default setting :
>
>
>
>
>
>
>
>
>
>
>
> *host% ompi_info --all | grep slurm                 MCA ras: slurm (MCA
> v2.0, API v2.0, Component v1.6.5)                 MCA plm: slurm (MCA v2.0,
> API v2.0, Component v1.6.5)                 MCA ess: slurm (MCA v2.0, API
> v2.0, Component v1.6.5)                 MCA ess: slurmd (MCA v2.0, API
> v2.0, Component v1.6.5)                 MCA ras: parameter
> "ras_slurm_priority" (current value: <75>, data source: default
> value)                          Priority of the slurm ras
> component                 MCA plm: parameter "plm_slurm_args" (current
> value: <none>, data source: default value)                 MCA plm:
> parameter "plm_slurm_priority" (current value: <0>, data source: default
> value)                 MCA ess: parameter "ess_slurm_priority" (current
> value: <0>, data source: default value)                 MCA ess: parameter
> "ess_slurmd_priority" (current value: <0>, data source: default value)*
>
>
> About "-H" option and using --bynode option:
>
> In my case, I do not specify number of slots by node to openmpi (see
> mpirun command just above). From what I see the only place I define number
> of slots in this case is actually through SLURM configuration
> (SLURM_JOB_CPUS_PER_NODE=4(x3)). And I was not expected this to be taken
> when running mpi processes.
>
> Using --bynode is probably the easiest solution in my case, even if I am
> scared that it will not necessary fit all my running configuration. Better
> solution would be to review my management script for better integration
> with slurm resources manager, but this is another story.
>
> Thanks for your help.
> Regards,
> Nicolas
>
>
> 2018-05-16 9:47 GMT+02:00 r...@open-mpi.org <r...@open-mpi.org>:
>
>> The problem here is that you have made an incorrect assumption. In the
>> older OMPI versions, the -H option simply indicated that the specified
>> hosts were available for use - it did not imply the number of slots on that
>> host. Since you have specified 2 slots on each host, and you told mpirun to
>> launch 2 procs of your second app_context (the “slave”), it filled the
>> first node with the 2 procs.
>>
>> I don’t recall the options for that old a version, but IIRC you should
>> add --pernode to the cmd line to get exactly 1 proc/node
>>
>> Or upgrade to a more recent OMPI version where -H can also be used to
>> specify the #slots on a node :-)
>>
>>
>> > On May 15, 2018, at 11:58 PM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com> wrote:
>> >
>> > You can try to disable SLURM :
>> >
>> > mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ...
>> >
>> > That will require you are able to SSH between compute nodes.
>> > Keep in mind this is far form ideal since it might leave some MPI
>> > processes on nodes if you cancel a job, and mess SLURM accounting too.
>> >
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wed, May 16, 2018 at 3:50 PM, Nicolas Deladerriere
>> > <nicolas.deladerri...@gmail.com> wrote:
>> >> Hi all,
>> >>
>> >>
>> >>
>> >> I am trying to run mpi application through SLURM job scheduler. Here
>> is my
>> >> running sequence
>> >>
>> >>
>> >> sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun
>> >>
>> >>
>> >> In order to minimize modification of my production environment, I had
>> to
>> >> setup following hostlist management in different scripts:
>> >>
>> >>
>> >> my_env_script.sh
>> >>
>> >>
>> >> build host list from SLURM resource manager information
>> >>
>> >> Example: node01 nslots=2 ; node02 nslots=2 ; node03 nslots=2
>> >>
>> >>
>> >> my_run_script.sh
>> >>
>> >>
>> >> Build host list according to required job (process mapping depends on
>> job
>> >> requirement).
>> >>
>> >> Nodes are always fully dedicated to my job, but I have to manage
>> different
>> >> master-slave situation with corresponding mpirun command:
>> >>
>> >> as many process as number of slots:
>> >>
>> >> mpirun -H node01 -np 1 process_master.x : -H
>> node02,node02,node03,node03 -np
>> >> 4 process_slave.x
>> >>
>> >> only one process per node (slots are usually used through openMP
>> threading)
>> >>
>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2
>> >> other_process_slave.x
>> >>
>> >>
>> >>
>> >> However, I realized that whatever I specified through my mpirun
>> command,
>> >> process mapping is overridden at run time by slurm according to slurm
>> >> setting (either default setting or sbatch command line). For example,
>> if I
>> >> run with:
>> >>
>> >>
>> >> sbatch -N 3 --exclusive my_env_script.sh myjob
>> >>
>> >>
>> >> where final mpirun command (depending on myjob) is:
>> >>
>> >>
>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2
>> >> other_process_slave.x
>> >>
>> >>
>> >> It will be run with process mapping corresponding to:
>> >>
>> >>
>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node02 -np 2
>> >> other_process_slave.x
>> >>
>> >>
>> >> So far I did not find a way to force mpirun to run with host mapping
>> from
>> >> command line instead of slurm one. Is there a way to do it (either by
>> using
>> >> MCA parameters of slurm configuration or …) ?
>> >>
>> >>
>> >> openmpi version : 1.6.5
>> >>
>> >> slurm version : 17.11.2
>> >>
>> >>
>> >>
>> >> Ragards,
>> >>
>> >> Nicolas
>> >>
>> >>
>> >> Note 1: I know, it would be better to let slurm manage my process
>> mapping by
>> >> only using slurm parameters and not specifying host mapping in my
>> mpirun
>> >> command, but in order to minimize modification in my production
>> environment
>> >> I had to use such solution.
>> >>
>> >> Note 2: I know I am using old openmpi version !
>> >>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users@lists.open-mpi.org
>> >> https://lists.open-mpi.org/mailman/listinfo/users
>> > _______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to