Hi,
I am playing on OpenMpi(1.6.2) on cygwin platform, and
while compile and check were fine

the simple "mpirun hello_c.exe" is failing with the criptic

##################################################################
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file /pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/plm/rsh/plm_rsh_module.c at line 197 [MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file /pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/mca/ess/hnp/ess_hnp_module.c at line 228
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_plm_init failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file /pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/runtime/orte_init.c at line 128
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[MARCOATZERI:07440] [[15164,0],0] ORTE_ERROR_LOG: Not found in file /pub/devel/openmpi/openmpi-1.6.2-1/src/openmpi-1.6.2/orte/tools/orterun/orterun.c at line 694
#####################################################################

trying to debug I notice a strange pattern on ssh search:
1)  ssh is only searched on the PATH directories that end with "bin"
    other directories are skipped.
2) //usr/bin/ssh is not on the PATH but is searched.
   Why and where is defined in the code ?

103 321183 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh 100 324353 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   99  327381 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
36 1805679 [main] orterun 6304 normalize_posix_path: src /home/marco/bin/ssh 34 1807010 [main] orterun 6304 normalize_posix_path: src /usr/local/bin/ssh
   34 1808236 [main] orterun 6304 normalize_posix_path: src /usr/bin/ssh
   37 1810858 [main] orterun 6304 normalize_posix_path: src //usr/bin/ssh

as immediately after the "//" search mpirun crashes

703 9508968 [WNetOpenEnum] orterun 8020 cygthread::stub: thread 'WNetOpenEnum', id 0x15A0, stack_ptr 0x28BAD40
--- Process 8020, exception 000006AB at 776BB9BC
41286 9550254 [main] orterun 8020 fs_info::update: Cannot get volume attributes (\??\UNC), C0000010

I suspect this search is the culprit.

If someone is interested I put here
http://matzeri.altervista.org/works/ompi/

all the config, check and make logs plus the ompi_info output.

Regards
Marco

Reply via email to