Thanks a lot, this was exactly the problem: > Make sure that the PATH really is identical between users -- especially for > non-iteractive logins. E.g.: > > env
Here PATH was correct. > vs. > > ssh othernode env Here PATH was not correct. The PATH was set in .bash_profile and apparently in non-interactive logins .bash_profile is not sourced. Only .bashrc is sourced. So if the PATH is set in .bashrc everything is fine and the problem went away. Thanks again, Daniel > Also check the LD_LIBRARY_PATH. > > > On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson <fetchin...@googlemail.com> > wrote: > >> Hi folks, >> >> I have a really strange problem: a super simple MPI test program (see >> below) runs successfully for all users when executed on 4 processes in >> 1 node, but hangs for user A and runs successfully for user B when >> executed on 8 processes in 2 nodes. The executable used is the same >> and the appfile used is also the same for user A and user B. Both >> users launch it by >> >> mpirun --app appfile >> >> where the content of 'appfile' is >> >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> >> for the single node run with 4 processes and is replaced by >> >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node1 -wdir /tmp/test ./test >> -np 1 -host node2 -wdir /tmp/test ./test >> -np 1 -host node2 -wdir /tmp/test ./test >> -np 1 -host node2 -wdir /tmp/test ./test >> -np 1 -host node2 -wdir /tmp/test ./test >> >> for the 2-node run with 8 processes. Just to recap, the single node >> run works for both user A and user B, but the 2-node run only works >> for user B and it hangs for user A. It does respond to Ctrl-C though. >> Both users use bash, have set up passwordless ssh, are able to ssh >> from node1 to node2 and back, have the same PATH and use the same >> 'mpirun' executable. >> >> At this point I've run out of ideas what to check and debug because >> the setups look really identical. The test program is simply >> >> #include <stdio.h> >> #include <mpi.h> >> >> int main( int argc, char **argv ) >> { >> int node; >> >> MPI_Init( &argc, &argv ); >> MPI_Comm_rank( MPI_COMM_WORLD, &node ); >> >> printf( "First Hello World from Node %d\n", node ); >> MPI_Barrier( MPI_COMM_WORLD ); >> printf( "Second Hello World from Node %d\n",node ); >> >> MPI_Finalize( ); >> >> return 0; >> } >> >> >> I also asked both users to compile the test program separately, and >> the resulting executable 'test' is the same for both indicating again >> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is >> 1.5. and the interconnect is infiniband. >> >> I've really run out of ideas what else to compare between user A and B. >> >> Thanks for any hints, >> Daniel >> >> >> >> >> >> -- >> Psss, psss, put it down! - http://www.cafepress.com/putitdown >> >> >> >> -- >> Psss, psss, put it down! - http://www.cafepress.com/putitdown >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Psss, psss, put it down! - http://www.cafepress.com/putitdown