On Wed, Jul 29, 2009 at 01:42:39PM -0600, Ralph Castain wrote: > > It sounds like perhaps IOF messages aren't getting relayed along the > daemons. Note that the daemon on each node does have to be able to send > TCP messages to all other nodes, not just mpirun. > > Couple of things you can do to check: > > 1. -mca routed direct - this will send all messages direct instead of > across the daemons > > 2. --leave-session-attached - will allow you to see any errors reported > by the daemons, including those from attempting to relay messages > > Ralph > > On Jul 29, 2009, at 1:19 PM, David Doria wrote: > >> I wrote a simple program to display "hello world" from each process. >> >> When I run this (126 - my machine, 122, and 123), everything works ..... >> However, when I run this (126 - my machine, 122, 123, AND 125), I get >> no output at all. >> >> Is there any way to check what is going on / does anyone know what
All of the above good stuff and: Since the set of hosts all work in most of the possible permutations for the case of three but not four it is possible that your simple program has an issue in the way it exit(s). Please post your simple program..... I am looking for the omission of MPI_Finalize() or a funny return/exit status. http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node32.htm Also, Try adding a sleep(1) after the printf(...---"hello world"...) and/ or after MPI_Finalize() on the chance that there is a race on exit. Try the "hello world" example in the source package for Open MPI or at: http://www.dartmouth.edu/~rc/classes/intro_mpi/hello_world_ex.html You can also add gethostbyname() or environment variable checks etc to make sure that each host is involved as you expect in contrast to nearly anonymous rank number. Also double check to see which mpirun you are using. i.e alternatives on your system may be "interesting" since various versions of MPI are naturally in some distros $PATH/$path may be important. $ file /usr/bin/mpirun /usr/bin/mpirun: symbolic link to `/etc/alternatives/mpi-run' $ locate bin/mpirun /usr/bin/mpirun /usr/bin/mpirun.py $ rpm -qf /usr/bin/mpirun.py mpich2-1.1-1.fc10.x86_64 -- T o m M i t c h e l l Found me a new hat, now what?