Hi Siegmar,

my bad, there was a typo in my reply.
i really meant
> > what if you ?
> > mpiexec --host loki:2,exin:1 -np 3 hello_1_mpi


but you also tried that and it did not help.

i could not find anything in your logs that suggest mpiexec tries to 
start 5 MPI tasks,
did i miss something ?

i will try to reproduce the issue by myself

Cheers,

Gilles

----- Original Message -----
> Hi Gilles,
> 
> > what if you ?
> > mpiexec --host loki:1,exin:1 -np 3 hello_1_mpi
> 
> I need as many slots as processes so that I use "-np 2".
> "mpiexec --host loki,exin -np 2 hello_1_mpi" works as well. The 
command
> breaks, if I use at least "-np 3" and distribute the processes across 
at
> least two machines.
> 
> loki hello_1 118 mpiexec --host loki:1,exin:1 -np 2 hello_1_mpi
> Process 0 of 2 running on loki
> Process 1 of 2 running on exin
> Now 1 slave tasks are sending greetings.
> Greetings from task 1:
>    message type:        3
>    msg length:          131 characters
>    message:
>      hostname:          exin
>      operating system:  Linux
>      release:           4.4.49-92.11-default
>      processor:         x86_64
> loki hello_1 119
> 
> 
> 
> > are loki and exin different ? (os, sockets, core)
> 
> Yes, loki is a real machine and exin is a virtual one. "exin" uses a 
newer
> kernel.
> 
> loki fd1026 108 uname -a
> Linux loki 4.4.38-93-default #1 SMP Wed Dec 14 12:59:43 UTC 2016 (
2d3e9d4) 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> loki fd1026 109 ssh exin uname -a
> Linux exin 4.4.49-92.11-default #1 SMP Fri Feb 17 08:29:30 UTC 2017 (
8f9478a) 
> x86_64 x86_64 x86_64 GNU/Linux
> loki fd1026 110
> 
> The number of sockets and cores is identical, but the processor types 
are
> different as you can see at the end of my previous email. "loki" uses 
two
> "Intel(R) Xeon(R) CPU E5-2620 v3" processors and "exin" two "Intel 
Core
> Processor (Haswell, no TSX)" from QEMU. I can provide a pdf file with 
both
> topologies (89 K) if you are interested in the output from lstopo. I'
ve
> added some runs. Most interesting in my opinion are the last two
> "mpiexec --host exin:2,loki:3 -np 3 hello_1_mpi" and
> "mpiexec -np 3 --host exin:2,loki:3 hello_1_mpi".
> Why does mpiexec create five processes although I've asked for only 
three
> processes? Why do I have to break the program with <Ctrl-c> for the 
first
> of the above commands?
> 
> 
> 
> loki hello_1 110 mpiexec --host loki:2,exin:1 -np 3 hello_1_mpi
> ----------------------------------------------------------------------
----
> There are not enough slots available in the system to satisfy the 3 
slots
> that were requested by the application:
>    hello_1_mpi
> 
> Either request fewer slots for your application, or make more slots 
available
> for use.
> ----------------------------------------------------------------------
----
> 
> 
> 
> loki hello_1 111 mpiexec --host exin:3 -np 3 hello_1_mpi
> Process 0 of 3 running on exin
> Process 1 of 3 running on exin
> Process 2 of 3 running on exin
> ...
> 
> 
> 
> loki hello_1 115 mpiexec --host exin:2,loki:3 -np 3 hello_1_mpi
> Process 1 of 3 running on loki
> Process 0 of 3 running on loki
> Process 2 of 3 running on loki
> ...
> 
> Process 0 of 3 running on exin
> Process 1 of 3 running on exin
> [exin][[52173,1],1][../../../../../openmpi-v3.x-201705250239-d5200ea/
opal/mca/btl/tcp/btl_tcp_endpoint.c:794:mca_btl_tcp_endpoint_complete_
connect] 
> connect() to 193.xxx.xxx.xxx failed: Connection refused (111)
> 
> ^Cloki hello_1 116
> 
> 
> 
> 
> loki hello_1 116 mpiexec -np 3 --host exin:2,loki:3 hello_1_mpi
> Process 0 of 3 running on loki
> Process 2 of 3 running on loki
> Process 1 of 3 running on loki
> ...
> Process 1 of 3 running on exin
> Process 0 of 3 running on exin
> [exin][[51638,1],1][../../../../../openmpi-v3.x-201705250239-d5200ea/
opal/mca/btl/tcp/btl_tcp_endpoint.c:590:mca_btl_tcp_endpoint_recv_
blocking] 
> recv(16, 0/8) failed: Connection reset by peer (104)
> [exin:31909] 
> ../../../../../openmpi-v3.x-201705250239-d5200ea/ompi/mca/pml/ob1/pml_
ob1_sendreq.c:191 
> FATAL
> loki hello_1 117
> 
> 
> Do you need anything else?
> 
> 
> Kind regards and thank you very much for your help
> 
> Siegmar
> 
> 
> 
> > 
> > Cheers,
> > 
> > Gilles
> > 
> > ----- Original Message -----
> >> Hi,
> >>
> >> I have installed openmpi-v3.x-201705250239-d5200ea on my "SUSE 
Linux
> >> Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-7.1.0.
> >> Depending on the machine that I use to start my processes, I have
> >> a problem with "--host" for versions "v3.x" and "master", while
> >> everything works as expected with earlier versions.
> >>
> >>
> >> loki hello_1 111 mpiexec -np 3 --host loki:2,exin hello_1_mpi
> >> -------------------------------------------------------------------
---
> > ----
> >> There are not enough slots available in the system to satisfy the 3
> > slots
> >> that were requested by the application:
> >>     hello_1_mpi
> >>
> >> Either request fewer slots for your application, or make more slots
> > available
> >> for use.
> >> -------------------------------------------------------------------
---
> > ----
> >>
> >>
> >>
> >> Everything is ok if I use the same command on "exin".
> >>
> >> exin fd1026 107 mpiexec -np 3 --host loki:2,exin hello_1_mpi
> >> Process 0 of 3 running on loki
> >> Process 1 of 3 running on loki
> >> Process 2 of 3 running on exin
> >> ...
> >>
> >>
> >>
> >> Everything is also ok if I use openmpi-v2.x-201705260340-58c6b3c on 
"
> > loki".
> >>
> >> loki hello_1 114 which mpiexec
> >> /usr/local/openmpi-2.1.2_64_cc/bin/mpiexec
> >> loki hello_1 115 mpiexec -np 3 --host loki:2,exin hello_1_mpi
> >> Process 0 of 3 running on loki
> >> Process 1 of 3 running on loki
> >> Process 2 of 3 running on exin
> >> ...
> >>
> >>
> >> "exin" is a virtual machine on QEMU so that it uses a slightly
> > different
> >> processor architecture, e.g., it has no L3 cache but larger L2 
caches.
> >>
> >> loki fd1026 117 cat /proc/cpuinfo | grep -e "model name" -e "
physical
> > id" -e
> >> "cpu cores" -e "cache size" | sort | uniq
> >> cache size    : 15360 KB
> >> cpu cores    : 6
> >> model name    : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
> >> physical id    : 0
> >> physical id    : 1
> >>
> >>
> >> loki fd1026 118 ssh exin cat /proc/cpuinfo | grep -e "model name" -
e "
> > physical
> >> id" -e "cpu cores" -e "cache size" | sort | uniq
> >> cache size    : 4096 KB
> >> cpu cores    : 6
> >> model name    : Intel Core Processor (Haswell, no TSX)
> >> physical id    : 0
> >> physical id    : 1
> >>
> >>
> >> Any ideas what's different in the newer versions of Open MPI? Is 
the
> > new
> >> behavior intended? I would be grateful, if somebody can fix the
> > problem,
> >> if "mpiexec -np 3 --host loki:2,exin hello_1_mpi" should print my
> > messages
> >> in versions "3.x" and "master" as well, if the programs are started 
on
> > any
> >> machine. Do you need anything else? Thank you very much for any 
help
> > in
> >> advance.
> >>
> >>
> >> Kind regards
> >>
> >> Siegmar
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> >>
> > 
> > 
> > _______________________________________________
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to