Thanks a lot, this was exactly the problem:

> Make sure that the PATH really is identical between users -- especially for
> non-iteractive logins.  E.g.:
>
> env

Here PATH was correct.

> vs.
>
> ssh othernode env

Here PATH was not correct. The PATH was set in .bash_profile and
apparently in non-interactive logins .bash_profile is not sourced.
Only .bashrc is sourced. So if the PATH is set in .bashrc everything
is fine and the problem went away.

Thanks again,
Daniel


> Also check the LD_LIBRARY_PATH.
>
>
> On Feb 11, 2013, at 7:11 AM, Daniel Fetchinson <fetchin...@googlemail.com>
> wrote:
>
>> Hi folks,
>>
>> I have a really strange problem: a super simple MPI test program (see
>> below) runs successfully for all users when executed on 4 processes in
>> 1 node, but hangs for user A and runs successfully for user B when
>> executed on 8 processes in 2 nodes. The executable used is the same
>> and the appfile used is also the same for user A and user B. Both
>> users launch it by
>>
>> mpirun --app appfile
>>
>> where the content of 'appfile' is
>>
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>>
>> for the single node run with 4 processes and is replaced by
>>
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node1 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>> -np 1 -host node2 -wdir /tmp/test ./test
>>
>> for the 2-node run with 8 processes. Just to recap, the single node
>> run works for both user A and user B, but the 2-node run only works
>> for user B and it hangs for user A. It does respond to Ctrl-C though.
>> Both users use bash, have set up passwordless ssh, are able to ssh
>> from node1 to node2 and back, have the same PATH and use the same
>> 'mpirun' executable.
>>
>> At this point I've run out of ideas what to check and debug because
>> the setups look really identical. The test program is simply
>>
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main( int argc, char **argv )
>> {
>> int node;
>>
>> MPI_Init( &argc, &argv );
>> MPI_Comm_rank( MPI_COMM_WORLD, &node );
>>
>> printf( "First Hello World from Node %d\n", node );
>> MPI_Barrier( MPI_COMM_WORLD );
>> printf( "Second Hello World from Node %d\n",node );
>>
>> MPI_Finalize(  );
>>
>> return 0;
>> }
>>
>>
>> I also asked both users to compile the test program separately, and
>> the resulting executable 'test' is the same for both indicating again
>> that identical gcc, mpicc, etc, is used. Gcc is 4.5.1 and openmpi is
>> 1.5. and the interconnect is infiniband.
>>
>> I've really run out of ideas what else to compare between user A and B.
>>
>> Thanks for any hints,
>> Daniel
>>
>>
>>
>>
>>
>> --
>> Psss, psss, put it down! - http://www.cafepress.com/putitdown
>>
>>
>>
>> --
>> Psss, psss, put it down! - http://www.cafepress.com/putitdown
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


-- 
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Reply via email to