Try a "make uninstall" from the OMPI 1.2.8 source directory. The reason is that "make install" from OMPI 1.4.x won't uninstall the prior OMPI -- it'll just overwrite it. But some plugins from 1.2.8 will still be left, and confuse the OMPI 1.4 install.
On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote: > Ralph > > I had done a make clean in the 1.2.8 directory if that is what you meant ? > Or do I need to do something else ? > > I appreciate your help on this by the way ;-) > > > ----- Original Message ----- > From: Ralph Castain > To: Open MPI Users > Sent: Monday, February 13, 2012 3:41 PM > Subject: Re: [OMPI users] MPI orte_init fails on remote nodes > > You need to clean out the old attempt - that is a stale file > > Sent from my iPad > > On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <rich...@sharc.co.uk> wrote: > >> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some >> weird errors as below: >> mca: base: component_find: unable to open >> /usr/local/lib/openmpi/mca_ras_dash_host >> along with a few other files >> even though the .so / .la files are all there ! >> ----- Original Message ----- >> From: Ralph Castain >> To: Open MPI Users >> Sent: Monday, February 13, 2012 2:59 PM >> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes >> >> Good heavens - where did you find something that old? Can you use a more >> recent version? >> >> Sent from my iPad >> >> >> >>> Gentlemen >>> >>> I am struggling to get MPI working when the hostfile contains different >>> nodes. >>> >>> I get the error below. Any ideas ?? I can ssh without password between the >>> two >>> >>> nodes. I am running 1.2.8 MPI on both machines. >>> >>> Any help most appreciated !!!!! >>> >>> >>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst >>> /home/sharc/MPITEST/v8_mpi_test/mpitest >>> >>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67 >>> >>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>> runtime/orte_init_stage1.c at line 182 >>> >>> -------------------------------------------------------------------------- >>> >>> It looks like orte_init failed for some reason; your parallel process is >>> >>> likely to abort. There are many reasons that a parallel process can >>> >>> fail during orte_init; some of which are due to configuration or >>> >>> environment problems. This failure appears to be an internal failure; >>> >>> here's some additional information (which may only be relevant to an >>> >>> Open MPI developer): >>> >>> orte_rml_base_select failed >>> >>> --> Returned value -13 instead of ORTE_SUCCESS >>> >>> -------------------------------------------------------------------------- >>> >>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>> runtime/orte_system_init.c at line 42 >>> >>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file >>> runtime/orte_init.c at line 52 >>> >>> Open RTE was unable to initialize properly. The error occured while >>> >>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS. >>> >>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] >>> >>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs >>> >>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> base/pls_base_orted_cmds.c at line 275 >>> >>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c >>> at line 1158 >>> >>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at >>> line 90 >>> >>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as >>> expected. >>> >>> [linux-tmpw:10489] ERROR: There may be more information available from >>> >>> [linux-tmpw:10489] ERROR: the remote shell (see above). >>> >>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243. >>> >>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0] >>> >>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit >>> >>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> base/pls_base_orted_cmds.c at line 188 >>> >>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c >>> at line 1190 >>> >>> -------------------------------------------------------------------------- >>> >>> mpiexec was unable to cleanly terminate the daemons for this job. Returned >>> value Timeout instead of ORTE_SUCCESS. >>> >>> -------------------------------------------------------------------------- >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/