Difficult to know what to say here. I have no idea what your program does after validating the license. Does it execute some kind of MPI collective operation? Does only one proc validate the license and all others just use it?
All I can tell from your output is that the procs all launched okay. Ralph On Sep 27, 2019, at 4:32 PM, Steven Hill via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: Any assistance with this would be greatly appreciated. I’m running CENTOS 7 with Open MPI 1.10.7 We are using a product called XFlow by 3ds. I have been going back and forth trying to figure out why my OpenMPI job pause when expanding across more than one machine. I confirmed the OpenMPI environment variable paths to libraries and bin files are correct on all machines (Head Node and 3 Compute Nodes). LD_LIBRARY_PATH=/usr/lib64/openmpi/lib: PATH=/usr/lib64/openmpi/bin: I can run an MPI Job to display the host name. mpirun -host srv-comp01,srv-comp02,srv-comp03 hostname srv-comp02 srv-comp01 srv-comp03 If I run the command which normally pauses and I just identify the same hostname twice, it works fine i.e. mpirun -npernode 2 -host srv-comp01, srv-comp02 {command} At the suggestion of the vendor I tried I have tried “--mca btl tcp,self” the job still pauses at the same spot. The firewall is turned off on all machines. Password-less SSH works without issue. I have tested with this another product we use called starccm (has it’s own MPI Provider). I have not run hello_c or ring_c, I see them referenced in the FAQ “11. How can I diagnose problems when running across multiple hosts? “ I can’t see where to download them from. Here is a verbose output of the command. It always pauses at “[ INFO ] License validation OK” and goes no further. I am able to run the job without MPI on a single host. I’m not sure where to go from here. [symapp@srv-comp-hn ~]$ mpirun --version mpirun (Open MPI) 1.10.7 [symapp@srv-comp-hn ~]$ mpirun -npernode 1 --mca plm_base_verbose 10 -host srv-comp01,srv-comp02,srv-comp03 /mntnfs/eng-nfs/Apps/XFlow/engine-3d-mpi-ompi10 /mntnfs/eng-nfs/jsmith/XFlow/Periodic/PeriodicCavity_MPI3.xfp -maxcpu=1 [srv-comp-hn:04909] mca: base: components_register: registering plm components [srv-comp-hn:04909] mca: base: components_register: found loaded component isolated [srv-comp-hn:04909] mca: base: components_register: component isolated has no register or open function [srv-comp-hn:04909] mca: base: components_register: found loaded component rsh [srv-comp-hn:04909] mca: base: components_register: component rsh register function successful [srv-comp-hn:04909] mca: base: components_register: found loaded component slurm [srv-comp-hn:04909] mca: base: components_register: component slurm register function successful [srv-comp-hn:04909] mca: base: components_open: opening plm components [srv-comp-hn:04909] mca: base: components_open: found loaded component isolated [srv-comp-hn:04909] mca: base: components_open: component isolated open function successful [srv-comp-hn:04909] mca: base: components_open: found loaded component rsh [srv-comp-hn:04909] mca: base: components_open: component rsh open function successful [srv-comp-hn:04909] mca: base: components_open: found loaded component slurm [srv-comp-hn:04909] mca: base: components_open: component slurm open function successful [srv-comp-hn:04909] mca:base:select: Auto-selecting plm components [srv-comp-hn:04909] mca:base:select:( plm) Querying component [isolated] [srv-comp-hn:04909] mca:base:select:( plm) Query of component [isolated] set priority to 0 [srv-comp-hn:04909] mca:base:select:( plm) Querying component [rsh] [srv-comp-hn:04909] mca:base:select:( plm) Query of component [rsh] set priority to 10 [srv-comp-hn:04909] mca:base:select:( plm) Querying component [slurm] [srv-comp-hn:04909] mca:base:select:( plm) Skipping component [slurm]. Query failed to return a module [srv-comp-hn:04909] mca:base:select:( plm) Selected component [rsh] [srv-comp-hn:04909] mca: base: close: component isolated closed [srv-comp-hn:04909] mca: base: close: unloading component isolated [srv-comp-hn:04909] mca: base: close: component slurm closed [srv-comp-hn:04909] mca: base: close: unloading component slurm [srv-comp-hn:04909] [[15143,0],0] plm:rsh: final template argv: /usr/bin/ssh <template> orted --hnp-topo-sig 0N:4S:4L3:4L2:4L1:8C:8H:x86_64 -mca ess "env" -mca orte_ess_jobid "992411648" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "4" -mca orte_hnp_uri "992411648.0;tcp://10.1.28.49,192.168.122.1:33405" --tree-spawn --mca plm_base_verbose "10" -mca plm "rsh" -mca rmaps_ppr_n_pernode "1" --tree-spawn [srv-comp01:130272] mca: base: components_register: registering plm components [srv-comp01:130272] mca: base: components_register: found loaded component rsh [srv-comp01:130272] mca: base: components_register: component rsh register function successful [srv-comp01:130272] mca: base: components_open: opening plm components [srv-comp01:130272] mca: base: components_open: found loaded component rsh [srv-comp01:130272] mca: base: components_open: component rsh open function successful [srv-comp01:130272] mca:base:select: Auto-selecting plm components [srv-comp01:130272] mca:base:select:( plm) Querying component [rsh] [srv-comp01:130272] mca:base:select:( plm) Query of component [rsh] set priority to 10 [srv-comp01:130272] mca:base:select:( plm) Selected component [rsh] [srv-comp01:130272] [[15143,0],1] plm:rsh: final template argv: /usr/bin/ssh <template> orted --hnp-topo-sig 0N:35S:35L3:35L2:35L1:35C:35H:x86_64 -mca ess "env" -mca orte_ess_jobid "992411648" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "4" -mca orte_parent_uri "992411648.1;tcp://10.1.28.50,192.168.122.1:34662" -mca orte_hnp_uri "992411648.0;tcp://10.1.28.49,192.168.122.1:33405" --mca plm_base_verbose "10" -mca rmaps_ppr_n_pernode "1" -mca plm "rsh" --tree-spawn [srv-comp02:33362] mca: base: components_register: registering plm components [srv-comp02:33362] mca: base: components_register: found loaded component rsh [srv-comp02:33362] mca: base: components_register: component rsh register function successful [srv-comp02:33362] mca: base: components_open: opening plm components [srv-comp02:33362] mca: base: components_open: found loaded component rsh [srv-comp02:33362] mca: base: components_open: component rsh open function successful [srv-comp02:33362] mca:base:select: Auto-selecting plm components [srv-comp02:33362] mca:base:select:( plm) Querying component [rsh] [srv-comp02:33362] mca:base:select:( plm) Query of component [rsh] set priority to 10 [srv-comp02:33362] mca:base:select:( plm) Selected component [rsh] [srv-comp03:89338] mca: base: components_register: registering plm components [srv-comp03:89338] mca: base: components_register: found loaded component rsh [srv-comp03:89338] mca: base: components_register: component rsh register function successful [srv-comp03:89338] mca: base: components_open: opening plm components [srv-comp03:89338] mca: base: components_open: found loaded component rsh [srv-comp03:89338] mca: base: components_open: component rsh open function successful [srv-comp03:89338] mca:base:select: Auto-selecting plm components [srv-comp03:89338] mca:base:select:( plm) Querying component [rsh] [srv-comp03:89338] mca:base:select:( plm) Query of component [rsh] set priority to 10 [srv-comp03:89338] mca:base:select:( plm) Selected component [rsh] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],1] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],2] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],3] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [ INFO ] ## SIMULATION START ## [ INFO ] XFlow Build 106.00 [ INFO ] Execution line: /mntnfs/eng-nfs/Apps/XFlow/engine-3d-mpi-ompi10 /mntnfs/eng-nfs/jsmith/XFlow/Periodic/PeriodicCavity_MPI3.xfp -maxcpu=1 [ INFO ] Computation limited to: 1 cores per node. [ INFO ] [ INFO ] License validation OK ^C[srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],2] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [srv-comp02:33362] mca: base: close: component rsh closed [srv-comp02:33362] mca: base: close: unloading component rsh [srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],1] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive update proc state command from [[15143,0],3] [srv-comp-hn:04909] [[15143,0],0] plm:base:receive got update_proc_state for job [15143,1] [srv-comp03:89338] mca: base: close: component rsh closed [srv-comp03:89338] mca: base: close: unloading component rsh [srv-comp-hn:04909] mca: base: close: component rsh closed [srv-comp-hn:04909] mca: base: close: unloading component rsh [srv-comp01:130272] mca: base: close: component rsh closed [srv-comp01:130272] mca: base: close: unloading component rsh [symapp@srv-comp-hn ~]$ [symapp@srv-comp-hn ~]$ mpirun -host srv-comp01,srv-comp02,srv-comp03 hostname srv-comp02 srv-comp01 srv-comp03 [symapp@srv-comp-hn ~]$ env | grep -i path MANPATH=:/opt/pbs/share/man LD_LIBRARY_PATH=/usr/lib64/openmpi/lib: PATH=/usr/lib64/openmpi/bin:/opt/CD-adapco/13.04.011/STAR-View+13.04.011/bin:/opt/CD-adapco/13.04.011/STAR-CCM+13.04.011/star/bin:/opt/CD-adapco/13.04.010/STAR-View+13.04.010/bin:/opt/CD-adapco/13.04.010/STAR-CCM+13.04.010/star/bin:/mntnfs/eng-nfs/Apps/Abaqus/Commands:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/pbs/bin:/home/symapp/.local/bin:/home/symapp/bin MODULEPATH=/usr/share/Modules/modulefiles:/etc/modulefiles