Hi all, Thanks for your feedback,
about using " mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ...". I am a bit confused since syntax sounds good, but I keep getting following error at run time : *--------------------------------------------------------------------------MCA framework parameters can only take a single negation operator("^"), and it must be at the beginning of the value. The followingvalue violates this rule: env,^slurm,slurmdWhen used, the negation operator sets the "exclusive" behavior mode,meaning that it will exclude all specified components (and implicitlyinclude all others). ......You cannot mix inclusive and exclusive behavior.* Is there other mca setting that could violate command line setting ? here is full mpirun command line : */../openmpi/1.6.5/bin/mpirun -prefix /.../openmpi/1.6.5 -tag-output -H r01n05 -x OMP_NUM_THREADS -np 1 --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd master_exe.x: -H r01n06,r01n07 -x OMP_NUM_THREADS -np 2 slave_exe.x* and ompi default setting : *host% ompi_info --all | grep slurm MCA ras: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.6.5) MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.6.5) MCA ras: parameter "ras_slurm_priority" (current value: <75>, data source: default value) Priority of the slurm ras component MCA plm: parameter "plm_slurm_args" (current value: <none>, data source: default value) MCA plm: parameter "plm_slurm_priority" (current value: <0>, data source: default value) MCA ess: parameter "ess_slurm_priority" (current value: <0>, data source: default value) MCA ess: parameter "ess_slurmd_priority" (current value: <0>, data source: default value)* About "-H" option and using --bynode option: In my case, I do not specify number of slots by node to openmpi (see mpirun command just above). From what I see the only place I define number of slots in this case is actually through SLURM configuration (SLURM_JOB_CPUS_PER_NODE=4(x3)). And I was not expected this to be taken when running mpi processes. Using --bynode is probably the easiest solution in my case, even if I am scared that it will not necessary fit all my running configuration. Better solution would be to review my management script for better integration with slurm resources manager, but this is another story. Thanks for your help. Regards, Nicolas 2018-05-16 9:47 GMT+02:00 r...@open-mpi.org <r...@open-mpi.org>: > The problem here is that you have made an incorrect assumption. In the > older OMPI versions, the -H option simply indicated that the specified > hosts were available for use - it did not imply the number of slots on that > host. Since you have specified 2 slots on each host, and you told mpirun to > launch 2 procs of your second app_context (the “slave”), it filled the > first node with the 2 procs. > > I don’t recall the options for that old a version, but IIRC you should add > --pernode to the cmd line to get exactly 1 proc/node > > Or upgrade to a more recent OMPI version where -H can also be used to > specify the #slots on a node :-) > > > > On May 15, 2018, at 11:58 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > > > You can try to disable SLURM : > > > > mpirun --mca ras ^slurm --mca plm ^slurm --mca ess ^slurm,slurmd ... > > > > That will require you are able to SSH between compute nodes. > > Keep in mind this is far form ideal since it might leave some MPI > > processes on nodes if you cancel a job, and mess SLURM accounting too. > > > > > > Cheers, > > > > Gilles > > > > On Wed, May 16, 2018 at 3:50 PM, Nicolas Deladerriere > > <nicolas.deladerri...@gmail.com> wrote: > >> Hi all, > >> > >> > >> > >> I am trying to run mpi application through SLURM job scheduler. Here is > my > >> running sequence > >> > >> > >> sbatch --> my_env_script.sh --> my_run_script.sh --> mpirun > >> > >> > >> In order to minimize modification of my production environment, I had to > >> setup following hostlist management in different scripts: > >> > >> > >> my_env_script.sh > >> > >> > >> build host list from SLURM resource manager information > >> > >> Example: node01 nslots=2 ; node02 nslots=2 ; node03 nslots=2 > >> > >> > >> my_run_script.sh > >> > >> > >> Build host list according to required job (process mapping depends on > job > >> requirement). > >> > >> Nodes are always fully dedicated to my job, but I have to manage > different > >> master-slave situation with corresponding mpirun command: > >> > >> as many process as number of slots: > >> > >> mpirun -H node01 -np 1 process_master.x : -H > node02,node02,node03,node03 -np > >> 4 process_slave.x > >> > >> only one process per node (slots are usually used through openMP > threading) > >> > >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2 > >> other_process_slave.x > >> > >> > >> > >> However, I realized that whatever I specified through my mpirun command, > >> process mapping is overridden at run time by slurm according to slurm > >> setting (either default setting or sbatch command line). For example, > if I > >> run with: > >> > >> > >> sbatch -N 3 --exclusive my_env_script.sh myjob > >> > >> > >> where final mpirun command (depending on myjob) is: > >> > >> > >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node03 -np 2 > >> other_process_slave.x > >> > >> > >> It will be run with process mapping corresponding to: > >> > >> > >> mpirun -H node01 -np 1 other_process_master.x : -H node02,node02 -np 2 > >> other_process_slave.x > >> > >> > >> So far I did not find a way to force mpirun to run with host mapping > from > >> command line instead of slurm one. Is there a way to do it (either by > using > >> MCA parameters of slurm configuration or …) ? > >> > >> > >> openmpi version : 1.6.5 > >> > >> slurm version : 17.11.2 > >> > >> > >> > >> Ragards, > >> > >> Nicolas > >> > >> > >> Note 1: I know, it would be better to let slurm manage my process > mapping by > >> only using slurm parameters and not specifying host mapping in my mpirun > >> command, but in order to minimize modification in my production > environment > >> I had to use such solution. > >> > >> Note 2: I know I am using old openmpi version ! > >> > >> > >> _______________________________________________ > >> users mailing list > >> users@lists.open-mpi.org > >> https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users