Re: [OMPI users] Running mpirun with grid

Ralph Castain via users Sun, 31 May 2020 07:55:00 -0700

The messages about the daemons is coming from two different sources. Grid is 
saying it was able to spawn the orted - then the orted is saying it doesn't 
know how to communicate and fails.


I think the root of the problem lies in the plm output that shows the qrsh it 
will use to start the job. For some reason, mpirun is still trying to "tree 
spawn", which (IIRC) isn't allowed on grid (all the daemons have to be launched 
in one shot by mpirun using qrsh). Try adding "--mca plm_rsh_no_tree_spawn 1" 
to your mpirun cmd line.


>> 
>> 
>> On Sat, 30 May 2020 at 00:41, Kulshrestha, Vipul via users 
>> <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> 
>>> 
>>> I need to launch my openmpi application on grid. My application is designed 
>>> to run N processes, where each process would have M threads. I am using 
>>> open MPI version 4.0.1
>>> 
>>> 
>>> 
>>> % /build/openmpi/openmpi-4.0.1/rhel6/bin/ompi_info | grep grid
>>> 
>>>                 MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component 
>>> v4.0.1)
>>> 
>>> 
>>> 
>>> To run it without grid, I run it as (say N = 7, M = 2)
>>> 
>>> % mpirun –np 7 <application with arguments>
>>> 
>>> 
>>> 
>>> The above works well and runs N processes. Based on some earlier advice on 
>>> this forum, I have setup the grid submission using the a grid job 
>>> submission script that modifies the grid slow allocation, so that mpirun 
>>> launches only 1 application process copy on each host allocated by grid. I 
>>> have some partial success. I think grid is able to start the job and then 
>>> mpirun also starts to run, but then it errors out with below mentioned 
>>> errors. Strangely, after giving message for having started all the daemons, 
>>> it reports that it was not able to start one or more daemons.
>>> 
>>> 
>>> 
>>> I have setup a grid submission script that modifies the pe_hostfile and it 
>>> appears that mpirun is able to take it and then is able use the host 
>>> information to start launching the jobs. However, mpirun halts before it 
>>> can start all the child processes. I enabled some debug logs but am not 
>>> able to figure out a possible cause.
>>> 
>>> 
>>> 
>>> Could somebody look at this and guide how to resolve this issue?
>>> 
>>> 
>>> 
>>> I have pasted the detailed log as well as my job submission script below.
>>> 
>>> 
>>> 
>>> As a clarification, when I run the mpirun without grid, it (mpirun and my 
>>> application) works on the same set of hosts without any problems.
>>> 
>>> 
>>> 
>>> Thanks,
>>> 
>>> Vipul
>>> 
>>> 
>>> 
>>> Job submission script:
>>> 
>>> #!/bin/sh
>>> 
>>> #$ -N velsyn
>>> 
>>> #$ -pe orte2 14
>>> 
>>> #$ -V -cwd -j y
>>> 
>>> #$ -o out.txt
>>> 
>>> #
>>> 
>>> echo "Got $NSLOTS slots."
>>> 
>>> echo "tmpdir is $TMPDIR"
>>> 
>>> echo "pe_hostfile is $PE_HOSTFILE"
>>> 
>>> 
>>> 
>>> 
>>> 
>>> cat $PE_HOSTFILE
>>> 
>>> newhostfile=/testdir/tmp/pe_hostfile
>>> 
>>> 
>>> 
>>> awk '{$2 = $2/2; print}' $PE_HOSTFILE > $newhostfile
>>> 
>>> 
>>> 
>>> export PE_HOSTFILE=$newhostfile
>>> 
>>> export LD_LIBRARY_PATH=/build/openmpi/openmpi-4.0.1/rhel6/lib
>>> 
>>> 
>>> 
>>> mpirun --merge-stderr-to-stdout --output-filename ./output:nojobid,nocopy 
>>> --mca routed direct --mca orte_base_help_aggregate 0 --mca plm_base_verbose 
>>> 1 --bind-to none --report-bindings -np 7 <application with args>
>>> 
>>> 
>>> 
>>> The out.txt content is:
>>> 
>>> Got 14 slots.
>>> 
>>> tmpdir is /tmp/182117160.1.all.q
>>> 
>>> pe_hostfile is /var/spool/sge/bos2/active_jobs/182117160.1/pe_hostfile
>>> 
>>> bos2.wv.org.com 2 [email protected] <NULL> art8.wv.org.com 2 
>>> [email protected] <NULL> art10.wv.org.com 2 [email protected] 
>>> <NULL> hpb7.wv.org.com 2 [email protected] <NULL> bos15.wv.org.com 2 
>>> [email protected] <NULL> bos1.wv.org.com 2 [email protected] 
>>> <NULL> hpb11.wv.org.com 2 [email protected] <NULL> [bos2:22657] 
>>> [[8251,0],0] plm:rsh: using "/wv/grid2/sge/bin/lx-amd64/qrsh -inherit 
>>> -nostdin -V -verbose" for launching [bos2:22657] [[8251,0],0] plm:rsh: 
>>> final template argv:
>>> 
>>>  /grid2/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose <template>     
>>> set path = ( /build/openm
>>> 
>>> pi/openmpi-4.0.1/rhel6/bin $path ) ; if ( $?LD_LIBRARY_PATH == 1 ) set 
>>> OMPI_have_llp ; if ( $?LD_LIBR ARY_PATH == 0 ) setenv LD_LIBRARY_PATH 
>>> /build/openmpi/openmpi-4.0.1/rhel6/lib ; if ( $?OMPI_have_llp == 1 ) setenv 
>>> LD_LIBRARY_PATH /build/openmpi/openmpi-4.0.1/rhel6/lib:$LD_LIBRARY_PATH ; 
>>> if ( $?DYLD_L IBRARY_PATH == 1 ) set OMPI_have_dllp ; if ( 
>>> $?DYLD_LIBRARY_PATH == 0 ) setenv DYLD_LIBRARY_PATH /bui 
>>> ld/openmpi/openmpi-4.0.1/rhel6/lib ; if ( $?OMPI_have_dllp == 1 ) setenv 
>>> DYLD_LIBRARY_PATH /build/ope
>>> 
>>> nmpi/openmpi-4.0.1/rhel6/lib:$DYLD_LIBRARY_PATH ;   
>>> /build/openmpi/openmpi-4.0.1/rhel6/bin/orted -mca
>>> 
>>> orte_report_bindings "1" -mca ess "env" -mca ess_base_jobid "540737536" 
>>> -mca ess_base_vpid "<templat
>>> 
>>> e>" -mca ess_base_num_procs "7" -mca orte_node_regex
>>> 
>>> e>"bos[1:2],art[1:8],art[2:10],hpb[1:7],bos[2:15],
>>> 
>>> bos[1:1],hpb[2:11]@0(7)" -mca orte_hnp_uri 
>>> "540737536.0;tcp://147.34.116.60:50769" --mca routed "dire ct" --mca 
>>> orte_base_help_aggregate "0" --mca plm_base_verbose "1" -mca plm "rsh" 
>>> --tree-spawn -mca or te_parent_uri "540737536.0;tcp://147.34.116.60:50769" 
>>> -mca orte_output_filename "./output:nojobid,noc opy" -mca 
>>> hwloc_base_binding_policy "none" -mca hwloc_base_report_bindings "1" -mca 
>>> pmix "^s1,s2,cray ,isolated"
>>> 
>>> Starting server daemon at host "art10"
>>> 
>>> Starting server daemon at host "art8"
>>> 
>>> Starting server daemon at host "bos1"
>>> 
>>> Starting server daemon at host "hpb7"
>>> 
>>> Starting server daemon at host "hpb11"
>>> 
>>> Starting server daemon at host "bos15"
>>> 
>>> Server daemon successfully started with task id "1.art8"
>>> 
>>> Server daemon successfully started with task id "1.bos1"
>>> 
>>> Server daemon successfully started with task id "1.art10"
>>> 
>>> Server daemon successfully started with task id "1.bos15"
>>> 
>>> Server daemon successfully started with task id "1.hpb7"
>>> 
>>> Server daemon successfully started with task id "1.hpb11"
>>> 
>>> Unmatched ".
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE was unable to reliably start one or more daemons.
>>> 
>>> This usually is caused by:
>>> 
>>> 
>>> 
>>> * not finding the required libraries and/or binaries on
>>> 
>>>  one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>> 
>>>  settings, or configure OMPI with --enable-orterun-prefix-by-default
>>> 
>>> 
>>> 
>>> * lack of authority to execute on one or more specified nodes.
>>> 
>>>  Please verify your allocation and authorities.
>>> 
>>> 
>>> 
>>> * the inability to write startup files into /tmp 
>>> (--tmpdir/orte_tmpdir_base).
>>> 
>>>  Please check with your sys admin to determine the correct location to use.
>>> 
>>> 
>>> 
>>> *  compilation of the orted with dynamic libraries when static are required
>>> 
>>>  (e.g., on Cray). Please check your configure cmd line and consider using
>>> 
>>>  one of the contrib/platform definitions for your system type.
>>> 
>>> 
>>> 
>>> * an inability to create a connection back to mpirun due to a
>>> 
>>>  lack of common network interfaces and/or no route found between
>>> 
>>>  them. Please check network connectivity (including firewalls
>>> 
>>>  and network routing requirements).
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE does not know how to route a message to the specified daemon located 
>>> on the indicated node:
>>> 
>>> 
>>> 
>>>  my node:   bos2
>>> 
>>>  target node:  art10
>>> 
>>> 
>>> 
>>> This is usually an internal programming error that should be reported to 
>>> the developers. In the meantime, a workaround may be to set the MCA param 
>>> routed=direct on the command line or in your environment. We apologize for 
>>> the problem.
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE does not know how to route a message to the specified daemon located 
>>> on the indicated node:
>>> 
>>> 
>>> 
>>>  my node:   bos2
>>> 
>>>  target node:  hpb7
>>> 
>>> 
>>> 
>>> This is usually an internal programming error that should be reported to 
>>> the developers. In the meantime, a workaround may be to set the MCA param 
>>> routed=direct on the command line or in your environment. We apologize for 
>>> the problem.
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE does not know how to route a message to the specified daemon located 
>>> on the indicated node:
>>> 
>>> 
>>> 
>>>  my node:   bos2
>>> 
>>>  target node:  bos15
>>> 
>>> 
>>> 
>>> This is usually an internal programming error that should be reported to 
>>> the developers. In the meantime, a workaround may be to set the MCA param 
>>> routed=direct on the command line or in your environment. We apologize for 
>>> the problem.
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE does not know how to route a message to the specified daemon located 
>>> on the indicated node:
>>> 
>>> 
>>> 
>>>  my node:   bos2
>>> 
>>>  target node:  bos1
>>> 
>>> 
>>> 
>>> This is usually an internal programming error that should be reported to 
>>> the developers. In the meantime, a workaround may be to set the MCA param 
>>> routed=direct on the command line or in your environment. We apologize for 
>>> the problem.
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> ORTE does not know how to route a message to the specified daemon located 
>>> on the indicated node:
>>> 
>>> 
>>> 
>>>  my node:   bos2
>>> 
>>>  target node:  hpb11
>>> 
>>> 
>>> 
>>> This is usually an internal programming error that should be reported to 
>>> the developers. In the meantime, a workaround may be to set the MCA param 
>>> routed=direct on the command line or in your environment. We apologize for 
>>> the problem.
>>> 
>>> --------------------------------------------------------------------------

Re: [OMPI users] Running mpirun with grid

Reply via email to