Dear OpenMPI,I'm transitioning from LAM-MPI to OpenMPI and have just compiled OMPI 1.0.2 on OS X server 10.4.6. I'm using gcc 3.3 and XLF (both f77 and f90), and I'm using ssh to run the jobs. My cluster is all G5 dual 2GHz+ xserves, and I am using both ethernet ports for communication. One is used for NFS and the other is for MPI.
I've had few problems the past year running this config with LAM-MPI (latest release). But what worked before doesn't with OpenMPI 1.0.2.
When I run any parallel job that spans multiple machines, the process runs indefinitely. I've checked this using the BLACS and PBLAS test routines, the HPL benchmark, and even a simple mpi-pong program. All of them execute but produce no output past some initial output, consuming 100% of the CPU on every node that's launched. In contrast, all of these programs run in a few seconds on a single node, with two processors, and up to -np 8. When I cntrl-C to stop the program, openmpi safely stops all the processes, no matter how many machines have been used.
I noticed a couple postings from the past few months that seem to be related but didn't seem to be quite the same symptoms. Any ideas what could be going on?
OpenMPI is a really great project, and it is obvious the quality of software development that has gone into it. I appreciate all your help. My config.log and omni-info.out files are attached.
Lee Peterson Professor Aerospace Engineering Sciences University of Colorado Boulder, CO
ompi-output.tgz
Description: Binary data