In the initial report, the /usr/bin/ssh process was in the 'T' state
(it generally hints the process is attached by a debugger)

/usr/bin/ssh -x b09-32 orted

did behave as expected (e.g. orted was executed, exited with an error since the command line is invalid, and error message was received)


can you try to run

/home/user/openmpi_install/bin/mpirun --host b09-30,b09-32 hostname

and see how things go ? (since you simply 'ssh orted', an other orted might be used)

If you are still facing the same hang with ssh in the 'T' state, can you check the logs on b09-32 and see if the sshd server was even contacted ? I can hardly make sense of this error fwiw.


Cheers,

Gilles

On 5/15/2018 5:27 AM, r...@open-mpi.org wrote:
You got that error because the orted is looking for its rank on the cmd line and not finding it.


On May 14, 2018, at 12:37 PM, Max Mellette <wmell...@ucsd.edu <mailto:wmell...@ucsd.edu>> wrote:

Hi Gus,

Thanks for the suggestions. The correct version of openmpi seems to be getting picked up; I also prepended .bashrc with the installation path like you suggested, but it didn't seemed to help:

user@b09-30:~$ cat .bashrc
export PATH=/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
export LD_LIBRARY_PATH=/home/user/openmpi_install/lib
user@b09-30:~$ which mpicc
/home/user/openmpi_install/bin/mpicc
user@b09-30:~$ /usr/bin/ssh -x b09-32 orted
[b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 147 [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file util/session_dir.c at line 106 [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file util/session_dir.c at line 345 [b09-32:204536] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file base/ess_base_std_orted.c at line 270
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
--------------------------------------------------------------------------

Thanks,
Max


On Mon, May 14, 2018 at 11:41 AM, Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>> wrote:

    Hi Max

    Just in case, as environment mix often happens.
    Could it be that you are inadvertently picking another
    installation of OpenMPI, perhaps installed from packages
    in /usr , or /usr/local?
    That's easy to check with 'which mpiexec' or
    'which mpicc', for instance.

    Have you tried to prepend (as opposed to append) OpenMPI
    to your PATH? Say:

    export
    
PATH='/home/user/openmpi_install/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin'

    I hope this helps,
    Gus Correa


_______________________________________________
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/users



_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to