> What the system is saying is that (a) you don't have transparent ssh > authority on one or more of your nodes, and/or (b) the system was unable > to > locate the Open MPI code libraries on the remote node. For the first > problem, please see the FAQ at:
> http://www.open-mpi.org/faq/?category=rsh#ssh-keys > > > Once you have that fixed, then you should check the remote nodes to > ensure > that the Open MPI code libraries are available - you may need to provide > a > prefix directory to mpirun to tell us where they are. Please see the FAQ > at: > > > http://www.open-mpi.org/faq/?category=running > > > For some advice in that area. > > > Hope that helps > Ralph I think these suggestions: (a) nontransparent ssh authority and (b) being unable to locate the Open MPI code libraries on the remote node are not the problems. (a)Passwordless ssh is setup and all nodes see the same home! (b)the Open MPI code libraries are located in my home which sees every node. mpirun sometimes works with all cpus/nodes of the cluster, but sometimes it won't and the error described below will occur. > > > On 12/1/06 8:17 AM, "Jens Klostermann" > <jens.klostermann_at_[hidden]> wrote: > > > > I 've got the same problem as described in: > > http://www.open-mpi.org/community/lists/users/2006/07/1537.php > > > > From: Chengwen Chen (chenchengwen_at_[hidden]) > > Date: 2006-07-04 03:53:26 > > > > > > > > The problem seems to occur randomly! It occurs more often if I use a > > larger number of cpu, but always never if I use a small number of > cpus. > > So far my cure to the problem is to kill and restart my application > > (mpirun ...) as often untill the error won't occur and mpirun will > run. > > > > So is the problem resolved. Can anybody give me an hint? > > > > I am using a amd64 linux (suse10) cluster with infiniband conection > and > > openmpi-1.2a1r10111. > > > > I attach the ompi_info --param all all output, hope it helps. > > > > Regards Jens > > _______________________________________________ > > users mailing list > > users_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/users >