Yes, it looks like you have a heterogeneous system (i.e., a binary compiled on one server doesn't necessarily run properly on another server).
In this case, you should see the heterogeneous section of the FAQ. Fair warning, though -- heterogeneous systems are more difficult to manage/maintain/use then homogeneous systems... On Mar 26, 2013, at 3:54 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote: > It may be because the other system is running upgraded version of linux which > is not having infiniband drivers. Any solution? > > > On Tue, Mar 26, 2013 at 12:42 PM, Syed Ahsan Ali <ahsansha...@gmail.com> > wrote: > Tried this but mpirun exits with this error > > mpirun -np 40 /home/MET/hrm/bin/hrm > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > librdmacm: assuming: 4 > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > CMA: unable to get RDMA device list > librdmacm: couldn't read ABI version. > librdmacm: couldn't read ABI version. > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > librdmacm: assuming: 4 > CMA: unable to get RDMA device list > -------------------------------------------------------------------------- > [[33095,1],8]: A high-performance Open MPI point-to-point messaging module > was unable to find any relevant network interfaces: > Module: OpenFabrics (openib) > Host: pmd04.pakmet.com > Another transport will be used instead, although this may result in > lower performance. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > Process 1 ([[33095,1],28]) is on host: compute-02-00.private02.pakmet.com > Process 2 ([[33095,1],0]) is on host: pmd02 > BTLs attempted: openib self sm > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > > > Ahsan > > On Fri, Mar 22, 2013 at 7:09 PM, Ralph Castain <r...@open-mpi.org> wrote: > > On Mar 22, 2013, at 3:42 AM, Syed Ahsan Ali <ahsansha...@gmail.com> wrote: > >> Actually due to some data base corruption I am not able to add any new node >> to cluster from the installer node. So I want to run parallel job on more >> nodes without adding them to existing cluster. >> You are right the binaries must be present on the remote node as well. >> Is this possible throught nfs? just as the compute nodes are nfs mounted >> with the installer node. > > Sure - OMPI doesn't care how the binaries got there. Just so long as they are > present on the compute node. > >> >> Ahsan >> >> >> On Fri, Mar 22, 2013 at 3:33 PM, Reuti <re...@staff.uni-marburg.de> wrote: >> Am 22.03.2013 um 10:14 schrieb Syed Ahsan Ali: >> >> > I have a very basic question. If we want to run mpirun job on two systems >> > which are not part of cluster, then how we can make it possible. Can the >> > host be specifiend on mpirun which is not compute node, rather a stand >> > alone system. >> >> Sure, the machines can be specified as argument to `mpiexec`. But do you >> want to run applications just between these two machines, or should they >> participate on a larger parallel job with machines of the cluster: then a >> direct network connection between outside and inside of the cluster is >> necessary by some kind of forwarding in case these are separated networks. >> >> Also the paths to the started binaries may be different, in case the two >> machines are not sharing the same /home with the cluster and this needs to >> be honored. >> >> In case you are using a queuing system and want to route jobs to outside >> machines of the set up cluster: it's necessary to negotiate with the admin >> to allow jobs being scheduled thereto. >> >> -- Reuti >> >> >> > Thanks >> > Ahsan >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- >> Syed Ahsan Ali Bokhari >> Electronic Engineer (EE) >> >> Research & Development Division >> Pakistan Meteorological Department H-8/4, Islamabad. >> Phone # off +92518358714 >> Cell # +923155145014 >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > Syed Ahsan Ali Bokhari > Electronic Engineer (EE) > > Research & Development Division > Pakistan Meteorological Department H-8/4, Islamabad. > Phone # off +92518358714 > Cell # +923155145014 > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/