[OMPI users] error in running openmpi on remote node

2006-07-04 Thread Chengwen Chen

Dear openmpi users,

I am using openmpi-1.0.2 on Redhat linux. I can succussfully run mpirun in
single PC with 2 np. But fail in remote node. Can you give me some advices?
thank you very much in advance.

[say@wolf45 tmp]$ mpirun -np 2 /tmp/test.x

[say@wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46 /tmp/test.x
say@wolf46's password:
orted: Command not found.
[wolf45:11357] ERROR: A daemon on node wolf46 failed to start as expected.
[wolf45:11357] ERROR: There may be more information available from
[wolf45:11357] ERROR: the remote shell (see above).
[wolf45:11357] ERROR: The daemon exited unexpectedly with status 1.


Re: [OMPI users] error in running openmpi on remote node

2006-07-05 Thread Chengwen Chen

Thank you very much. This problem is solved when I change the shell of
remote node to B shell. Because I set the LD_LIBRARY_PATH in .bashrc file
while the default shell was C shell.

Althoguth it works on my testing program test.x, some errors occured when I
run other programme. BTW, I tried to run this programme on single PC with 2
np successfully.

Any suggestions? Thank you

[say@wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46
/usr/local/amber9/exe/sander.MPI -O -i /tmp/amber9mintest.in -o
/tmp/amber9mintest.out -c /tmp/amber9mintest.inpcrd -p
/tmp/amber9mintest.prmtop -r /tmp/amber9mintest.rst
[wolf46.chem.cuhk.edu.hk:06002] *** An error occurred in MPI_Barrier
[ wolf46.chem.cuhk.edu.hk:06002] *** on communicator MPI_COMM_WORLD
[wolf46.chem.cuhk.edu.hk:06002] *** MPI_ERR_INTERN: internal error
[ wolf46.chem.cuhk.edu.hk:06002] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)






On 7/4/06, Brian Barrett  wrote:


On Jul 4, 2006, at 1:53 AM, Chengwen Chen wrote:

> Dear openmpi users,
>
> I am using openmpi-1.0.2 on Redhat linux. I can succussfully run
> mpirun in single PC with 2 np. But fail in remote node. Can you
> give me some advices? thank you very much in advance.
>
> [say@wolf45 tmp]$ mpirun -np 2 /tmp/test.x
>
> [say@wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46 /tmp/test.x
> say@wolf46's password:
> orted: Command not found.
> [wolf45:11357] ERROR: A daemon on node wolf46 failed to start as
> expected.
> [wolf45:11357] ERROR: There may be more information available from
> [wolf45:11357] ERROR: the remote shell (see above).
> [wolf45:11357] ERROR: The daemon exited unexpectedly with status 1.

Kefeng is correct that you should setup your ssh keys so that you
aren't prompted for a password, but that isn't the cause of your
failure.  The problem appears to be that orted (one of the Open MPI
commands) is not in your path on the remote node.  You should take a
look at one of the other FAQ sections on the setup required for Open
MPI in an rsh/ssh type environment.

   http://www.open-mpi.org/faq/?category=running


Hope this helps,

Brian

--
  Brian Barrett
  Open MPI developer
   http://www.open-mpi.org/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] error in running openmpi on remote node

2006-07-24 Thread Chengwen Chen

I have tried to use v1.1 openmpi. but the program (AMBER9) I am using can't
be compiled correctly by v1.1. So I seems that I have to keep using
openmpi-1.02.
I am new in linux, I really have no idea about debugger. Would you please
give me some advice to try in a simple way?
Thank you very much!


On 7/6/06, Jeff Squyres (jsquyres)  wrote:


 Ick.  This isn't a helpful error message, is it?  :-)

Can you try upgrading to the recently-released v1.1 and see if the error
is still occurring?

Have you tried running your application through a memory-checking debugger
such as valgrind, perchance?


 --
*From:* users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] *On
Behalf Of *Chengwen Chen
*Sent:* Wednesday, July 05, 2006 3:32 AM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] error in running openmpi on remote node



 Thank you very much. This problem is solved when I change the shell of
remote node to B shell. Because I set the LD_LIBRARY_PATH in .bashrc file
while the default shell was C shell.

Althoguth it works on my testing program test.x, some errors occured when
I run other programme. BTW, I tried to run this programme on single PC with
2 np successfully.

Any suggestions? Thank you

[say@wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46
/usr/local/amber9/exe/sander.MPI -O -i /tmp/amber9mintest.in -o
/tmp/amber9mintest.out -c /tmp/amber9mintest.inpcrd -p
/tmp/amber9mintest.prmtop -r /tmp/amber9mintest.rst
[wolf46.chem.cuhk.edu.hk:06002] *** An error occurred in MPI_Barrier
[ wolf46.chem.cuhk.edu.hk:06002] *** on communicator MPI_COMM_WORLD
[wolf46.chem.cuhk.edu.hk:06002 ] *** MPI_ERR_INTERN: internal error
[ wolf46.chem.cuhk.edu.hk:06002] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 process killed (possibly by Open MPI)






On 7/4/06, Brian Barrett  wrote:
>
> On Jul 4, 2006, at 1:53 AM, Chengwen Chen wrote:
>
> > Dear openmpi users,
> >
> > I am using openmpi-1.0.2 on Redhat linux. I can succussfully run
> > mpirun in single PC with 2 np. But fail in remote node. Can you
> > give me some advices? thank you very much in advance.
> >
> > [say@wolf45 tmp]$ mpirun -np 2 /tmp/test.x
> >
> > [say@wolf45 tmp]$ mpirun -np 2 --host wolf45,wolf46 /tmp/test.x
> > say@wolf46's password:
> > orted: Command not found.
> > [wolf45:11357] ERROR: A daemon on node wolf46 failed to start as
> > expected.
> > [wolf45:11357] ERROR: There may be more information available from
> > [wolf45:11357] ERROR: the remote shell (see above).
> > [wolf45:11357] ERROR: The daemon exited unexpectedly with status 1.
>
> Kefeng is correct that you should setup your ssh keys so that you
> aren't prompted for a password, but that isn't the cause of your
> failure.  The problem appears to be that orted (one of the Open MPI
> commands) is not in your path on the remote node.  You should take a
> look at one of the other FAQ sections on the setup required for Open
> MPI in an rsh/ssh type environment.
>
>http://www.open-mpi.org/faq/?category=running
>
>
> Hope this helps,
>
> Brian
>
> --
>   Brian Barrett
>   Open MPI developer
>http://www.open-mpi.org/
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users