Hello,
On Tue, Jun 22, 2010 at 8:05 AM, Ralph Castain <r...@open-mpi.org> wrote:
> Sorry for the problem - the issue is a bug in the handling of the
>pernode option in 1.4.2. This has been fixed and awaits release in
>1.4.3.
>
Thank you for pointing this out. Unfortunately, I still am not able
to start remote processes::
$ mpirun --host compute-0-11 -np 1 ./hello_mpi
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
The same program runs fine if I use "--host localhost".
Doing a "strace -v" on the "mpirun" invocation shows a strange
invocation of "orted"::
execve("//usr/bin/ssh", ["/usr/bin/ssh", "-x", "compute-0-11",
" orted", "--daemonize", "-mca", "ess", "env",
"-mca", "orte_ess_jobid", "2322006016", "-mca",
"orte_ess_vpid", "1", "-mca", "orte_ess_num_procs", "2",
"--hnp-uri", "\"2322006016.0;tcp://192.168.122.1"],
["MKLROOT=/opt/intel/mkl/10.0.3.02", ...])
Indeed, the 192.168.122.1 address is connected to an internal Xen
bridge "virbr0", so it should not appear as a "call-back" address.
Is there a command-line option to force mpirun to use a certain IP address?
I have tried starting "mpirun" with "--mca btl_tcp_if_exclude lo,virbr0"
to no avail.
Also, the " orted" argument to ssh starts with a space; is this OK?
I'm using OMPI 1.4.2, self-compiled on a Rocks 5.2 (i.e., CentOS 5.2) cluster
Regards,
Riccardo