Try a "make uninstall" from the OMPI 1.2.8 source directory.

The reason is that "make install" from OMPI 1.4.x won't uninstall the prior 
OMPI -- it'll just overwrite it.  But some plugins from 1.2.8 will still be 
left, and confuse the OMPI 1.4 install.


On Feb 13, 2012, at 11:02 AM, Richard Bardwell wrote:

> Ralph
>  
> I had done a make clean in the 1.2.8 directory if that is what you meant ?
> Or do I need to do something else ?
>  
> I appreciate your help on this by the way ;-)
>  
>  
> ----- Original Message -----
> From: Ralph Castain
> To: Open MPI Users
> Sent: Monday, February 13, 2012 3:41 PM
> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
> 
> You need to clean out the old attempt - that is a stale file
> 
> Sent from my iPad
> 
> On Feb 13, 2012, at 7:36 AM, "Richard Bardwell" <rich...@sharc.co.uk> wrote:
> 
>> OK, I installed 1.4.4, rebuilt the exec and guess what ...... I now get some 
>> weird errors as below:
>> mca: base: component_find: unable to open 
>> /usr/local/lib/openmpi/mca_ras_dash_host
>> along with a few other files
>> even though the .so / .la files are all there !
>> ----- Original Message -----
>> From: Ralph Castain
>> To: Open MPI Users
>> Sent: Monday, February 13, 2012 2:59 PM
>> Subject: Re: [OMPI users] MPI orte_init fails on remote nodes
>> 
>> Good heavens - where did you find something that old? Can you use a more 
>> recent version?
>> 
>> Sent from my iPad
>> 
>> 
>>  
>>> Gentlemen
>>> 
>>> I am struggling to get MPI working when the hostfile contains different 
>>> nodes.
>>> 
>>> I get the error below. Any ideas ?? I can ssh without password between the 
>>> two
>>> 
>>> nodes. I am running 1.2.8 MPI on both machines.
>>> 
>>> Any help most appreciated !!!!!
>>> 
>>>  
>>> MPITEST/v8_mpi_test> mpiexec -n 2 --debug-daemons -hostfile test.hst 
>>> /home/sharc/MPITEST/v8_mpi_test/mpitest
>>> 
>>> Daemon [0,0,1] checking in as pid 10490 on host 192.0.2.67
>>> 
>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>> runtime/orte_init_stage1.c at line 182
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> It looks like orte_init failed for some reason; your parallel process is
>>> 
>>> likely to abort. There are many reasons that a parallel process can
>>> 
>>> fail during orte_init; some of which are due to configuration or
>>> 
>>> environment problems. This failure appears to be an internal failure;
>>> 
>>> here's some additional information (which may only be relevant to an
>>> 
>>> Open MPI developer):
>>> 
>>> orte_rml_base_select failed
>>> 
>>> --> Returned value -13 instead of ORTE_SUCCESS
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>> runtime/orte_system_init.c at line 42
>>> 
>>> [linux-z0je:08804] [NO-NAME] ORTE_ERROR_LOG: Not found in file 
>>> runtime/orte_init.c at line 52
>>> 
>>> Open RTE was unable to initialize properly. The error occured while
>>> 
>>> attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
>>> 
>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>> 
>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received kill_local_procs
>>> 
>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>> base/pls_base_orted_cmds.c at line 275
>>> 
>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c 
>>> at line 1158
>>> 
>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at 
>>> line 90
>>> 
>>> [linux-tmpw:10489] ERROR: A daemon on node 192.0.2.68 failed to start as 
>>> expected.
>>> 
>>> [linux-tmpw:10489] ERROR: There may be more information available from
>>> 
>>> [linux-tmpw:10489] ERROR: the remote shell (see above).
>>> 
>>> [linux-tmpw:10489] ERROR: The daemon exited unexpectedly with status 243.
>>> 
>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received message from [0,0,0]
>>> 
>>> [linux-tmpw:10490] [0,0,1] orted_recv_pls: received exit
>>> 
>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file 
>>> base/pls_base_orted_cmds.c at line 188
>>> 
>>> [linux-tmpw:10489] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c 
>>> at line 1190
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> mpiexec was unable to cleanly terminate the daemons for this job. Returned 
>>> value Timeout instead of ORTE_SUCCESS.
>>> 
>>> --------------------------------------------------------------------------
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to