Dear Brian, Thanks for the help
Brian Barrett wrote: > > The arguments you want would look like: > > > > mpirun -np X -mca btl gm,sm,self -mca btl_base_verbose 1 -mca > > btl_gm_debug 1 <other arguments> Aha. I think I had misunderstood the syntax slightly, which explains why I previously saw no debugging information. I has also omitted the "sm" btl - though I'm not sure what that one is.... I am now getting some debugging output [scarf-cn008.rl.ac.uk:04291] [0,1,0] gm_port 017746B0, board 545460846592, global 3712550725 node 180388626433 port 180388626436 [scarf-cn010.rl.ac.uk:13964] [0,1,2] gm_port 017746B0, board 545460846592, global 3712549034 node 180388626433 port 180388626436 [scarf-cn010.rl.ac.uk:13965] [0,1,3] gm_port 017746D0, board 545460846592, global 3712549034 node 180388626433 port 180388626437 [scarf-cn008.rl.ac.uk:04292] [0,1,1] gm_port 017746D0, board 545460846592, global 3712550725 node 180388626433 port 180388626437 [scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712550725 to node id 28 [scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712550725 to node id 180388626460 [scarf-cn010.rl.ac.uk:13965] [0,1,3] mapped global id 3712549034 to node id 180388626433 [scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712550725 to node id 1 [scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712549034 to node id 180388626455 [scarf-cn008.rl.ac.uk:04292] [0,1,1] mapped global id 3712549034 to node id 180388626455 [scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712550725 to node id 28 [scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712550725 to node id 180388626460 [scarf-cn010.rl.ac.uk:13964] [0,1,2] mapped global id 3712549034 to node id 180388626433 [scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712550725 to node id 1 [scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712549034 to node id 180388626455 [scarf-cn008.rl.ac.uk:04291] [0,1,0] mapped global id 3712549034 to node id 180388626455 which I home means that I am using the GM btl. The run is also about 20% quicker than before which may suggest that I was not previously using gm. I have also noticed that if I simply specify --mca btl ^tcp + the debugging options the run works apparently uses gm, and as quickly. It was (and is) the combination -mca btl gm,sm,self,^tcp that fails with No available btl components were found! > > >> >> Q3: Is there a way to make openmpi work with the LSF commands? So >> >> far >> >> I have constructed a hostfile from the LSF environment variable >> >> LSB_HOSTS and used the openmpi mpirun command to start the >> >> parallel executable. > > > > Currently, we do not have tight LSF integration for Open MPI, like we > > do for PBS, SLURM, and BProc. This is mainly because the only LSF > > machines the development team regularly uses are BProc machines, > > which do not use the traditional startup and allocation mechanisms of > > LSF. I believe it is on our feature request list, but I also don't > > believe we have a timeline for implementation. OK. It is actually quite easy to construct a hostfile from the LSF environment and start the processes using the openmpi mpirun command. I don't know how this will interact with for larger scale usage, job termination etc but I plan to experiment. One further question. My run times are still noticably longer than with mpich_gm. I saw in the mailing list archives that there was a new implementation of the collective routines in 1.0, (which my application depends on rather heavil. Is this the default in openmpi 1.1 or is it still necessary to specify this manually? And if anyone has a comparison of MPI_AlltoallV performance with other MPI implementations I'd like to hear the numbers. Thanks again for all the work. Openmpi looks very promising and it is definitely the easiest to install and get running of any MPI implementation I have tried so far. Keith Refson -- Dr Keith Refson, Building R3 Rutherford Appleton Laboratory Chilton Didcot Oxfordshire OX11 0QX