Hello, I am having trouble with a script that calls mpi. Basically my problem distills to wanting to call a script with:
mpirun -np # ./script.sh where script.sh looks like: #!/bin/bash mpirun -np 2 ./mpiprogram Whenever I invoke script.sh normally (as ./script.sh for instance) it works fine, but if I do mpirun -np 2 ./script.sh I get the following error: [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 [ppv.stanford.edu:08814] [[27860,1],0] could not get route to [[INVALID],INVALID] [ppv.stanford.edu:08814] [[27860,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/plm_base_proxy.c at line 86 I have also tried running with mpirun -d to get some debugging info and it appears that the proctable is not being created for the second mpirun. The command hangs like so: [ppv.stanford.edu:08823] procdir: /tmp/[email protected]_0/27855/0/0 [ppv.stanford.edu:08823] jobdir: /tmp/[email protected]_0/27855/0 [ppv.stanford.edu:08823] top: [email protected]_0 [ppv.stanford.edu:08823] tmp: /tmp [ppv.stanford.edu:08823] [[27855,0],0] node[0].name ppv daemon 0 arch ffc91200 [ppv.stanford.edu:08823] Info: Setting up debugger process table for applications MPIR_being_debugged = 0 MPIR_debug_state = 1 MPIR_partial_attach_ok = 1 MPIR_i_am_starter = 0 MPIR_proctable_size = 1 MPIR_proctable: (i, host, exe, pid) = (0, ppv.stanford.edu, /home/sluke/maintenance/openmpi-1.3.3/examples/./shell.sh, 8824) [ppv.stanford.edu:08825] procdir: /tmp/[email protected]_0/27855/1/0 [ppv.stanford.edu:08825] jobdir: /tmp/[email protected]_0/27855/1 [ppv.stanford.edu:08825] top: [email protected]_0 [ppv.stanford.edu:08825] tmp: /tmp [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 [ppv.stanford.edu:08825] [[27855,1],0] could not get route to [[INVALID],INVALID] [ppv.stanford.edu:08825] [[27855,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/plm_base_proxy.c at line 86 [ppv.stanford.edu:08825] Info: Setting up debugger process table for applications MPIR_being_debugged = 0 MPIR_debug_state = 1 MPIR_partial_attach_ok = 1 MPIR_i_am_starter = 0 MPIR_proctable_size = 0 MPIR_proctable: In this case, it does not matter what the ultimate mpiprogram I try to run is, the shell script fails in the same way regardless (I've tried the hello_f90 executable from the openmpi examples directory). Here are some details of my setup: I have built openmpi 1.3.3 with the intel fortran in c compilers (version 11.1). The machine uses rocks with the SGE scheduler, so I have run autoconf with ./configure --prefix=/home/sluke --with-sge, however this problem persists even if I am running on the head node outside of the scheduler. I am attaching the resulting config.log to this email as well as output to ompi_info --all and ifconfig. I hope this gives the experts on the list enough to go from, but I will be happy to provide any more information that might be helpful. Luke Shulenburger Geophysical Laboratory Carnegie Institution of Washington PS I have tried this on a machine with openmpi-1.2.6 and cannot reproduce the error, however on a second machine with openmpi-1.3.2 I have the same problem.
config.log.gz
Description: GNU Zip compressed data
ifconfigout
Description: Binary data
ompi_info
Description: Binary data
