In the initial report, the /usr/bin/ssh process was in the 'T' state
(it generally hints the process is attached by a debugger)
/usr/bin/ssh -x b09-32 orted
did behave as expected (e.g. orted was executed, exited with an error
since the command line is invalid, and error message was received)
can you try to run
/home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname
and see how things go ? (since you simply 'ssh orted', an other orted
might be used)
If you are still facing the same hang with ssh in the 'T' state, can you
check the logs on b09-32 and see
if the sshd server was even contacted ? I can hardly make sense of this
error fwiw.
Cheers,
Gilles
On 5/15/2018 5:27 AM, r...@open-mpi.org wrote:
You got that error because the orted is looking for its rank on the
cmd line and not finding it.
On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu
<mailto:wmell...@ucsd.edu>> wrote:
Hi Gus,
Thanks for the suggestions. The correct version of openmpi seems to
be getting picked up; I also prepended .bashrc with the installation
path like you suggested, but it didn't seemed to help:
user@b09-30:~$ cat .bashrc
export
PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
export LD_LIBRARY_PATH=/home/user/openmpi_install/lib
user@b09-30:~$ which mpicc
/home/user/openmpi_install/bin/mpicc
user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
[b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
ess_env_module.c at line 147
[b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
file util/session_dir.c at line 106
[b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
file util/session_dir.c at line 345
[b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
file base/ess_base_std_orted.c at line 270
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_session_dir failed
--> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
Thanks,
Max
On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu
<mailto:g...@ldeo.columbia.edu>> wrote:
Hi Max
Just in case, as environment mix often happens.
Could it be that you are inadvertently picking another
installation of OpenMPI, perhaps installed from packages
in /usr , or /usr/local?
That's easy to check with 'which mpiexec' or
'which mpicc', for instance.
Have you tried to prepend (as opposed to append) OpenMPI
to your PATH? Say:
export
PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'
I hope this helps,
Gus Correa
_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users