Dear Brian,

Thanks for the help

Brian Barrett wrote:


> > The arguments you want would look like:
> >
> >    mpirun -np X -mca btl gm,sm,self -mca btl_base_verbose 1 -mca
> > btl_gm_debug 1 <other arguments>

Aha.  I think I had misunderstood the syntax slightly, which explains why
I previously saw no debugging information.  I has also omitted the "sm"
btl - though I'm not sure what that one is....

I am now getting some debugging output

[scarf-cn008.rl.ac.uk:04291] [0,1,0] gm_port 017746B0, board 545460846592, 
global 3712550725 node
180388626433 port 180388626436
[scarf-cn010.rl.ac.uk:13964] [0,1,2] gm_port 017746B0, board 545460846592, 
global 3712549034 node
180388626433 port 180388626436
[scarf-cn010.rl.ac.uk:13965] [0,1,3] gm_port 017746D0, board 545460846592, 
global 3712549034 node
180388626433 port 180388626437
[scarf-cn008.rl.ac.uk:04292] [0,1,1] gm_port 017746D0, board 545460846592, 
global 3712550725 node
180388626433 port 180388626437
[scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712550725 to node id 28
[scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712550725 to node id 
180388626460
[scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712549034 to node id 
180388626433
[scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712550725 to node id 1
[scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712549034 to node id 
180388626455
[scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712549034 to node id 
180388626455
[scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712550725 to node id 28
[scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712550725 to node id 
180388626460
[scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712549034 to node id 
180388626433
[scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712550725 to node id 1
[scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712549034 to node id 
180388626455
[scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712549034 to node id 
180388626455

which I home means that I am using the GM btl.  The run is also about 20% 
quicker than
before which may suggest that I was not previously using gm.

I have also noticed that if I simply specify --mca btl ^tcp + the debugging 
options
the run works apparently uses gm, and as quickly.  It was (and is) the 
combination
  -mca btl gm,sm,self,^tcp
that fails with
   No available btl components were found!




> >
>> >> Q3:  Is there a way to make openmpi work with the LSF commands?  So
>> >> far
>> >>      I have constructed a hostfile from the LSF environment variable
>> >>      LSB_HOSTS and used the openmpi mpirun command to start the
>> >>      parallel executable.
> >
> > Currently, we do not have tight LSF integration for Open MPI, like we
> > do for PBS, SLURM, and BProc.  This is mainly because the only LSF
> > machines the development team regularly uses are BProc machines,
> > which do not use the traditional startup and allocation mechanisms of
> > LSF.  I believe it is on our feature request list, but I also don't
> > believe we have a timeline for implementation.

OK.  It is actually quite easy to construct a hostfile from the LSF
environment and start the processes using the openmpi mpirun command.
I don't know how this will interact with for larger scale usage,
job termination etc but I plan to experiment.

One further question.  My run times are still noticably longer than
with mpich_gm.  I saw in the mailing list archives that there was
a new implementation of the collective routines in 1.0, (which my application
depends on rather heavil.  Is this the default in openmpi 1.1 or is
it still necessary to specify this manually?  And if anyone has a comparison
of MPI_AlltoallV performance with other MPI implementations I'd like to
hear the numbers.

Thanks again for all the work.  Openmpi looks very promising and it is
definitely the easiest to install and get running of any MPI implementation
I have tried so far.

Keith Refson
-- 
Dr Keith Refson,
Building R3
Rutherford Appleton Laboratory
Chilton
Didcot
Oxfordshire OX11 0QX

Reply via email to