Hi,
I am playing on OpenMpi(1.6.2) on cygwin platform, and
while compile and check were fine
the simple "mpirun hello_c.exe" is failing with the criptic
##################################################################
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/plm/rsh/plm_rsh_module.c
at line 197
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/ess/hnp/ess_hnp_module.c
at line 228
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_plm_init failed
--> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/runtime/orte_init.c
at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_set_name failed
--> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file
/pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/tools/orterun/orterun.c
at line 694
#####################################################################
trying to debug I notice a strange pattern on ssh search:
1) ssh is only searched on the PATH directories that end with "bin"
other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
Why and where is defined in the code ?
103 321183 [main] orterun 6304 normalize_posix_path: src
/home/marco/bin/ssh
100 324353 [main] orterun 6304 normalize_posix_path: src
/usr/local/bin/ssh
99 327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
36 1805679 [main] orterun 6304 normalize_posix_path: src
/home/marco/bin/ssh
34 1807010 [main] orterun 6304 normalize_posix_path: src
/usr/local/bin/ssh
34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh
as immediately after the "//" search mpirun crashes
703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread
'WNetOpenEnum', id 0x15A0, stack_ptr 0x28BAD40
--- Process 8020, exception 000006AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume
attributes (\??\UNC), C0000010
I suspect this search is the culprit.
If someone is interested I put here
http://matzeri.altervista.org/works/ompi/
all the config, check and make logs plus the ompi_info output.
Regards
Marco