Tried this but mpirun exits with this error mpirun -np 40 /home/MET/hrm/bin/hrm librdmacm: couldn't read ABI version. librdmacm: assuming: 4 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 librdmacm: couldn't read ABI version. CMA: unable to get RDMA device list CMA: unable to get RDMA device list CMA: unable to get RDMA device list CMA: unable to get RDMA device list librdmacm: assuming: 4 librdmacm: couldn't read ABI version. librdmacm: assuming: 4 CMA: unable to get RDMA device list CMA: unable to get RDMA device list librdmacm: couldn't read ABI version. librdmacm: couldn't read ABI version. librdmacm: assuming: 4 CMA: unable to get RDMA device list librdmacm: assuming: 4 CMA: unable to get RDMA device list -------------------------------------------------------------------------- [[33095,1],8]: A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces: Module: OpenFabrics (openib) Host: pmd04.pakmet.com Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- At least one pair of MPI processes are unable to reach each other for MPI communications. This means that no Open MPI device has indicated that it can be used to communicate between these processes. This is an error; Open MPI requires that all MPI processes be able to reach each other. This error can sometimes be the result of forgetting to specify the "self" BTL. Process 1 ([[33095,1],28]) is on host: compute-02-00.private02.pakmet.com Process 2 ([[33095,1],0]) is on host: pmd02 BTLs attempted: openib self sm Your MPI job is now going to abort; sorry. --------------------------------------------------------------------------
Ahsan On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote: > > Actually due to some data base corruption I am not able to add any new > node to cluster from the installer node. So I want to run parallel job on > more nodes without adding them to existing cluster. > You are right the binaries must be present on the remote node as well. > Is this possible throught nfs? just as the compute nodes are nfs mounted > with the installer node. > > > Sure - OMPI doesn't care how the binaries got there. Just so long as they > are present on the compute node. > > > Ahsan > > > On Fri, Mar 22, 2013 at 3:33 PM, Reuti <re...@staff.uni-marburg.de> wrote: > >> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali: >> >> > I have a very basic question. If we want to run mpirun job on two >> systems which are not part of cluster, then how we can make it possible. >> Can the host be specifiend on mpirun which is not compute node, rather a >> stand alone system. >> >> Sure, the machines can be specified as argument to `mpiexec`. But do you >> want to run applications just between these two machines, or should they >> participate on a larger parallel job with machines of the cluster: then a >> direct network connection between outside and inside of the cluster is >> necessary by some kind of forwarding in case these are separated networks. >> >> Also the paths to the started binaries may be different, in case the two >> machines are not sharing the same /home with the cluster and this needs to >> be honored. >> >> In case you are using a queuing system and want to route jobs to outside >> machines of the set up cluster: it's necessary to negotiate with the admin >> to allow jobs being scheduled thereto. >> >> -- Reuti >> >> >> > Thanks >> > Ahsan >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > Cell # +923155145014 > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >