Gilles,

Adding ess component that excludes slurm and slurmd.
I run into trouble about connection issue. I guess I need slurm and slurmd
in my runtime context ! Anyway, as you mentioned that not a good solution
regarding remaining mpi process when using scancel and I guess i will also
lose some process monitoring functionalities from slurm.

I will stick with mpirun command line update using --bynode option.

Thanks a lot for your help.
Regards,
Nicolas


2018-05-17 14:23 GMT+02:00 Gilles Gouaillardet <
gilles.gouaillar...@gmail.com>:

> Nicolas,
>
> This looks odd at first glance, but as stated before, 1.6 is an obsolete
> series.
> A workaround could be to
> mpirun—mca ess ...
> And replace ... with a comma separated list of ess components that
> excludes both slurm and slurmd.
>
> An other workaround could be to remove SLURM related environment variables
> before calling mpirun.
>
>
> Cheers,
>
> Gilles
>
>
> On Thursday, May 17, 2018, Nicolas Deladerriere <
> nicolas.deladerri...@gmail.com> wrote:
>
>> Hi all,
>>
>> Thanks for your feedback,
>>
>> about using " mpirun --mca ras ^slurm --mca plm ^slurm --mca ess
>> ^slurm,slurmd ...". I am a bit confused since syntax sounds good, but I
>> keep getting following error at run time :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *--------------------------------------------------------------------------MCA
>> framework parameters can only take a single negation operator("^"), and it
>> must be at the beginning of the value.  The followingvalue violates this
>> rule:    env,^slurm,slurmdWhen used, the negation operator sets the
>> "exclusive" behavior mode,meaning that it will exclude all specified
>> components (and implicitlyinclude all others). ......You cannot mix
>> inclusive and exclusive behavior.*
>>
>> Is there other mca setting that could violate command line setting ?
>>
>> here is full mpirun command line :
>>
>> */../openmpi/1.6.5/bin/mpirun -prefix /.../openmpi/1.6.5 -tag-output -H
>> r01n05 -x OMP_NUM_THREADS -np 1 --mca ras ^slurm --mca plm ^slurm --mca ess
>> ^slurm,slurmd  master_exe.x: -H r01n06,r01n07 -x OMP_NUM_THREADS -np 2
>> slave_exe.x*
>>
>> and ompi default setting :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *host% ompi_info --all | grep slurm                 MCA ras: slurm (MCA
>> v2.0, API v2.0, Component v1.6.5)                 MCA plm: slurm (MCA v2.0,
>> API v2.0, Component v1.6.5)                 MCA ess: slurm (MCA v2.0, API
>> v2.0, Component v1.6.5)                 MCA ess: slurmd (MCA v2.0, API
>> v2.0, Component v1.6.5)                 MCA ras: parameter
>> "ras_slurm_priority" (current value: <75>, data source: default
>> value)                          Priority of the slurm ras
>> component                 MCA plm: parameter "plm_slurm_args" (current
>> value: <none>, data source: default value)                 MCA plm:
>> parameter "plm_slurm_priority" (current value: <0>, data source: default
>> value)                 MCA ess: parameter "ess_slurm_priority" (current
>> value: <0>, data source: default value)                 MCA ess: parameter
>> "ess_slurmd_priority" (current value: <0>, data source: default value)*
>>
>>
>> About "-H" option and using --bynode option:
>>
>> In my case, I do not specify number of slots by node to openmpi (see
>> mpirun command just above). From what I see the only place I define number
>> of slots in this case is actually through SLURM configuration
>> (SLURM_JOB_CPUS_PER_NODE=4(x3)). And I was not expected this to be taken
>> when running mpi processes.
>>
>> Using --bynode is probably the easiest solution in my case, even if I am
>> scared that it will not necessary fit all my running configuration. Better
>> solution would be to review my management script for better integration
>> with slurm resources manager, but this is another story.
>>
>> Thanks for your help.
>> Regards,
>> Nicolas
>>
>>
>> 2018-05-16 9:47 GMT+02:00 r...@open-mpi.org <r...@open-mpi.org>:
>>
>>> The problem here is that you have made an incorrect assumption. In the
>>> older OMPI versions, the -H option simply indicated that the specified
>>> hosts were available for use - it did not imply the number of slots on that
>>> host. Since you have specified 2 slots on each host, and you told mpirun to
>>> launch 2 procs of your second app_context (the “slave”), it filled the
>>> first node with the 2 procs.
>>>
>>> I don’t recall the options for that old a version, but IIRC you should
>>> add --pernode to the cmd line to get exactly 1 proc/node
>>>
>>> Or upgrade to a more recent OMPI version where -H can also be used to
>>> specify the #slots on a node :-)
>>>
>>>
>>> > On May 15, 2018, at 11:58 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>> >
>>> > You can try to disable SLURM :
>>> >
>>> > mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ...
>>> >
>>> > That will require you are able to SSH between compute nodes.
>>> > Keep in mind this is far form ideal since it might leave some MPI
>>> > processes on nodes if you cancel a job, and mess SLURM accounting too.
>>> >
>>> >
>>> > Cheers,
>>> >
>>> > Gilles
>>> >
>>> > On Wed, May 16, 2018 at 3:50 PM, Nicolas Deladerriere
>>> > <nicolas.deladerri...@gmail.com> wrote:
>>> >> Hi all,
>>> >>
>>> >>
>>> >>
>>> >> I am trying to run mpi application through SLURM job scheduler. Here
>>> is my
>>> >> running sequence
>>> >>
>>> >>
>>> >> sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun
>>> >>
>>> >>
>>> >> In order to minimize modification of my production environment, I had
>>> to
>>> >> setup following hostlist management in different scripts:
>>> >>
>>> >>
>>> >> my_env_script.sh
>>> >>
>>> >>
>>> >> build host list from SLURM resource manager information
>>> >>
>>> >> Example: node01 nslots=2 ; node02 nslots=2 ; node03 nslots=2
>>> >>
>>> >>
>>> >> my_run_script.sh
>>> >>
>>> >>
>>> >> Build host list according to required job (process mapping depends on
>>> job
>>> >> requirement).
>>> >>
>>> >> Nodes are always fully dedicated to my job, but I have to manage
>>> different
>>> >> master-slave situation with corresponding mpirun command:
>>> >>
>>> >> as many process as number of slots:
>>> >>
>>> >> mpirun -H node01 -np 1 process_master.x : -H
>>> node02,node02,node03,node03 -np
>>> >> 4 process_slave.x
>>> >>
>>> >> only one process per node (slots are usually used through openMP
>>> threading)
>>> >>
>>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2
>>> >> other_process_slave.x
>>> >>
>>> >>
>>> >>
>>> >> However, I realized that whatever I specified through my mpirun
>>> command,
>>> >> process mapping is overridden at run time by slurm according to slurm
>>> >> setting (either default setting or sbatch command line). For example,
>>> if I
>>> >> run with:
>>> >>
>>> >>
>>> >> sbatch -N 3 --exclusive my_env_script.sh myjob
>>> >>
>>> >>
>>> >> where final mpirun command (depending on myjob) is:
>>> >>
>>> >>
>>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2
>>> >> other_process_slave.x
>>> >>
>>> >>
>>> >> It will be run with process mapping corresponding to:
>>> >>
>>> >>
>>> >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node02 -np 2
>>> >> other_process_slave.x
>>> >>
>>> >>
>>> >> So far I did not find a way to force mpirun to run with host mapping
>>> from
>>> >> command line instead of slurm one. Is there a way to do it (either by
>>> using
>>> >> MCA parameters of slurm configuration or …) ?
>>> >>
>>> >>
>>> >> openmpi version : 1.6.5
>>> >>
>>> >> slurm version : 17.11.2
>>> >>
>>> >>
>>> >>
>>> >> Ragards,
>>> >>
>>> >> Nicolas
>>> >>
>>> >>
>>> >> Note 1: I know, it would be better to let slurm manage my process
>>> mapping by
>>> >> only using slurm parameters and not specifying host mapping in my
>>> mpirun
>>> >> command, but in order to minimize modification in my production
>>> environment
>>> >> I had to use such solution.
>>> >>
>>> >> Note 2: I know I am using old openmpi version !
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> users@lists.open-mpi.org
>>> >> https://lists.open-mpi.org/mailman/listinfo/users
>>> > _______________________________________________
>>> > users mailing list
>>> > users@lists.open-mpi.org
>>> > https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>>>
>>
>>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to