You might want to specify a wider range of ports. depending on how the socket is closed, a given port might or might not be available right after a job completes. iirc, and with default TCP settings, the worst case is a few minutes. I will double check sockets are created with SO_REUSE (or something like that), since this might help mitigate this kind of issues.
Cheers, Gilles On Tuesday, June 7, 2016, Ping Wang <ping.w...@asc-s.de> wrote: > Hello all, > > > > after the correct configuration, mpirun (v 1.10.2) works fine when all tpc > ports are open. I can ssh to all hosts without a password. > > Then it comes back to my first question: how to specify the ports for MPI > communication? > > I opened the ports 40000-50000 for outgoing traffic, when I run: > > mpirun --mca btl_tcp_port_min_v4 40040 --mca btl_tcp_port_range_v4 10 > --mca oob_tcp_static_ipv4_ports 40020 --host <IP1>,<IP2> hostname > > it works, but not every time. Same as when I run mpirun --mca > oob_tcp_static_ipv4_ports 40020 --host <IP1>,<IP2> hostname > > It is strange that sometimes I can get outputs, sometimes it just hangs. > Did I miss something? > > > > Best, > > Ping > > > > > > *Von:* users [mailto:users-boun...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>] *Im Auftrag > von *Gilles Gouaillardet > *Gesendet:* Freitag, 3. Juni 2016 00:14 > *An:* Open MPI Users > *Betreff:* Re: [OMPI users] Firewall settings for MPI communication > > > > The syntax is > > configure --enable-mpirun-prefix-by-default --prefix=<path to OpenMPI> ... > > > > all hosts must be able to ssh each other passwordless. > > that means you need to generate a user ssh key pair on all hosts, add your > public keys to the list of authorized keys, and ssh to all hosts in order > to populate your known hosts > > (ssh requires you confirm host public keys the very first time you ssh to > a new host) > > iirc, that can be automated with ssh-keyscan. > > > > when ssh is fully configured, mpirun should work just fine > > > > Cheers, > > > > Gilles > > > On Friday, June 3, 2016, Ping Wang <ping.w...@asc-s.de > <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');>> wrote: > > Hi, > > > > thank you Gilles for your suggestion. I tried: mpirun --prefix <path to > Open MPI> --host <public IP> hostname, then it works. > > I’m sure both IPs are the ones of the VM on which mpirun is running, and > they are unique. > > > > I also configured Open MPI with --enable-mpirun-prefix-by-default, but I > still need to add --prefix <path to Open MPI> to get mpirun work. > > I used: ./configure --enable-mpirun-prefix-by-default ="<path to Open > MPI> " > make > make install > > Did I miss something or I misunderstood the way to configure Open MPI? > > > > When I run: ssh < internal/public IP > `which orted` > > The output is: Warning: Permanently added < internal/public IP > ' > (ECDSA) to the list of known hosts. > /usr/local/bin/orted > > Is it all right? > > > > Cheers, > > Ping > > > > > > *Von:* users [mailto:users-boun...@open-mpi.org] *Im Auftrag von *Gilles > Gouaillardet > *Gesendet:* Donnerstag, 2. Juni 2016 17:06 > *An:* Open MPI Users > *Betreff:* Re: [OMPI users] Firewall settings for MPI communication > > > > are you saying both IP are the ones of the VM on which mpirun is running ? > > orted is only launched on all the machines *except* the one running mpirun. > > > > can you double/triple check the IPs are ok and unique ? > > for example, mpirun --host <internal IP> /sbin/ifconfig -a > > can you also make sure Open MPI is installed on all your VMs in the same > directory ? > > also make sure Open MPI has all the dependencies on all the VMs > > ssh xxx ldd `which orted` > > should show no missing dependency > > > > generally speaking, I recommend you configure Open MPI with > > --enable-mpirun-prefix-by-default > > > > you can also try to replace > > mpirun > > with > > `which mpirun` > > or > > mpirun --prefix <path to Open MPI> > > > > Cheers, > > > > Gilles > > On Thursday, June 2, 2016, Ping Wang <ping.w...@asc-s.de> wrote: > > Hi, > > I've installed Open MPI v1.10.2. Every VM on the cloud has two IPs > (internal IP, public IP). > When I run: mpirun --host <internal IP> hostname, the output is the > hostname of the VM. > But when I run: mpirun --host <public IP> hostname, the output is > > bash: orted: command not found > -------------------------------------------------------------------------- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to > use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > > Both IPs are the IP of the VM where MPI is running. Did I do something > wrong in the configuration? > > Thanks for any help. > > Ping > > -----Ursprüngliche Nachricht----- > Von: users [mailto:users-boun...@open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>] Im Auftrag > von Jeff Squyres (jsquyres) > Gesendet: Mittwoch, 1. Juni 2016 15:02 > An: Open MPI User's List > Betreff: Re: [OMPI users] Firewall settings for MPI communication > > In addition, you might want to consider upgrading to Open MPI v1.10.x > (v1.6.x is fairly ancient). > > > On Jun 1, 2016, at 7:46 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > > > > which network are your VMs using for communications ? > > if this is tcp, then you also have to specify a restricted set of > > allowed ports for the tcp btl > > > > that would be something like > > mpirun --mca btl_tcp_dynamic_ports 49990-50010 ... > > > > please double check the Open MPI 1.6.5 parameter and syntax with > > ompi_info --all (or check the archives, I think I posted the correct > > command line a few weeks ago) > > > > Cheers, > > > > Gilles > > > > On Wednesday, June 1, 2016, Ping Wang <ping.w...@asc-s.de > <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');>> wrote: > > I'm using Open MPI 1.6.5 to run OpenFOAM in parallel on several VMs on > > a cloud. mpirun hangs without any error messages. I think this is a > > firewall issue. Because when I open all the TCP ports(1-65535) in the > > security group of VMs, mpirun works well. However I was suggested to > > open as less ports as possible. So I have to limit MPI to run on a > > range of ports. I opened the port range 49990-50010 for MPI > > communication. And use command > > > > > > > > mpirun --mca oob_tcp_dynamic_ports 49990-50010 -np 4 --hostfile machines > simpleFoam –parallel. > > > > > > > > But it still hangs. How can I specify a port range that OpenMPI will > use? I appreciate any help you can provide. > > > > > > > > Best, > > > > Ping Wang > > > > > > > > <image001.png> > > > > ------------------------------------------------------ > > > > Ping Wang > > > > Automotive Simulation Center Stuttgart e.V. > > > > Nobelstraße 15 > > > > D-70569 Stuttgart > > > > Telefon: +49 711 699659-14 > > > > Fax: +49 711 699659-29 > > > > E-Mail: ping.w...@asc-s.de > <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');> > > > > Web: http://www.asc-s.de > > > > Social Media: <image002.gif>/asc.stuttgart > > > > ------------------------------------------------------ > > > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2016/06/29340.php > > > -- > Jeff Squyres > jsquy...@cisco.com <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');> > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29342.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29349.php >