On Jul 16, 2006, at 6:12 AM, Keith Refson wrote:

The compile of openmpi 1.1 was without problems and
appears to have correctly built the GM btl.
$ ompi_info -a | egrep "\bgm\b|_gm_"
               MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)

Ok, so GM support is definitely built into your build of Open MPI, which is a good start.

However I have been unable to sey up a parallel run which uses gm.
If I start a run using the openmpi mpirun command, the program executes correctly in parallel. However the timings appear to suggest that it is
using tcp, and the command executed on the node looks  like:

orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 -- nodename
scarf-cn001.rl.ac.uk --universe
cse0...@scarf-cn001.rl.ac.uk:default-universe-28588 --nsreplica
"0.0.0;tcp://192.168.1.1:52491;tcp://130.246.142.1:52491" --gprreplica
"0.0.0;tcp://192.168.1.1:52491;t

Right, orted is just a starter for the MPI processes -- the information on interconnects to use and that kind of stuff is passed through the out-of-band communication mechanism. orted doesn't really care which interconnect the MPI process is going to use, so we don't pass it on the command line.

Furthermore if attempt to start with the mpirun arguments "--mca btl
gm,self,^tcp" the run aborts at the MPI_INIT call.

Q1:  Is there anything else I have to do to get openmpi to use gm?

The command line you want is:

  mpirun -np X -mca btl gm,sm,self <other arguments>

If this causes an error during MPI_INIT or early in your application, it would be useful to see all the output form the parallel run. That likely indicates that there is something wrong with the initialization of the interconnect.

Q2:  Is there any way of diagnosing which btl is actually being used
and why? None "-v" option to mpirun, "-mca btl btl_base_verbose"
     or "-mca btl  btl_gm_debug=1" make any difference or produce any
     more output

The arguments you want would look like:

mpirun -np X -mca btl gm,sm,self -mca btl_base_verbose 1 -mca btl_gm_debug 1 <other arguments>

Q3: Is there a way to make openmpi work with the LSF commands? So far
     I have constructed a hostfile from the LSF environment variable
     LSB_HOSTS and used the openmpi mpirun command to start the
     parallel executable.

Currently, we do not have tight LSF integration for Open MPI, like we do for PBS, SLURM, and BProc. This is mainly because the only LSF machines the development team regularly uses are BProc machines, which do not use the traditional startup and allocation mechanisms of LSF. I believe it is on our feature request list, but I also don't believe we have a timeline for implementation.


Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to