You might want to specify a wider range of ports.
depending on how the socket is closed, a given port might or might not be
available right after a job completes. iirc, and with default TCP settings,
the worst case is a few minutes.
I will double check sockets are created with SO_REUSE (or something like
that), since this might help mitigate this kind of issues.

Cheers,

Gilles

On Tuesday, June 7, 2016, Ping Wang <ping.w...@asc-s.de> wrote:

> Hello all,
>
>
>
> after the correct configuration, mpirun (v 1.10.2) works fine when all tpc
> ports are open. I can ssh to all hosts without a password.
>
> Then it comes back to my first question: how to specify the ports for MPI
> communication?
>
> I opened the ports 40000-50000 for outgoing traffic, when I run:
>
> mpirun --mca btl_tcp_port_min_v4 40040 --mca btl_tcp_port_range_v4 10
> --mca oob_tcp_static_ipv4_ports 40020 --host <IP1>,<IP2>  hostname
>
> it works, but not every time. Same as when I run mpirun  --mca
> oob_tcp_static_ipv4_ports 40020 --host <IP1>,<IP2>  hostname
>
> It is strange that sometimes I can get outputs, sometimes it just hangs.
> Did I miss something?
>
>
>
> Best,
>
> Ping
>
>
>
>
>
> *Von:* users [mailto:users-boun...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>] *Im Auftrag
> von *Gilles Gouaillardet
> *Gesendet:* Freitag, 3. Juni 2016 00:14
> *An:* Open MPI Users
> *Betreff:* Re: [OMPI users] Firewall settings for MPI communication
>
>
>
> The syntax is
>
> configure --enable-mpirun-prefix-by-default --prefix=<path to OpenMPI> ...
>
>
>
> all hosts must be able to ssh each other passwordless.
>
> that means you need to generate a user ssh key pair on all hosts, add your
> public keys to the list of authorized keys, and ssh to all hosts in order
> to populate your known hosts
>
> (ssh requires you confirm host public keys the very first time you ssh to
> a new host)
>
> iirc, that can be automated with ssh-keyscan.
>
>
>
> when ssh is fully configured, mpirun should work just fine
>
>
>
> Cheers,
>
>
>
> Gilles
>
>
> On Friday, June 3, 2016, Ping Wang <ping.w...@asc-s.de
> <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');>> wrote:
>
> Hi,
>
>
>
> thank you Gilles for your suggestion.  I tried:  mpirun --prefix <path to
> Open MPI>  --host <public IP> hostname, then it works.
>
> I’m sure both IPs are the ones of the VM on which mpirun is running, and
> they are unique.
>
>
>
> I also configured Open MPI with --enable-mpirun-prefix-by-default, but I
> still need to add --prefix <path to Open MPI> to get mpirun work.
>
> I used: ./configure --enable-mpirun-prefix-by-default ="<path to Open
> MPI>  "
>              make
>              make install
>
> Did I miss something or I misunderstood the way to configure Open MPI?
>
>
>
> When I run: ssh < internal/public IP > `which orted`
>
> The output is: Warning: Permanently added < internal/public IP > '
> (ECDSA) to the list of known hosts.
> /usr/local/bin/orted
>
> Is it all right?
>
>
>
> Cheers,
>
> Ping
>
>
>
>
>
> *Von:* users [mailto:users-boun...@open-mpi.org] *Im Auftrag von *Gilles
> Gouaillardet
> *Gesendet:* Donnerstag, 2. Juni 2016 17:06
> *An:* Open MPI Users
> *Betreff:* Re: [OMPI users] Firewall settings for MPI communication
>
>
>
> are you saying both IP are the ones of the VM on which mpirun is running ?
>
> orted is only launched on all the machines *except* the one running mpirun.
>
>
>
> can you double/triple check the IPs are ok and unique ?
>
> for example, mpirun --host <internal IP> /sbin/ifconfig -a
>
> can you also make sure Open MPI is installed on all your VMs in the same
> directory ?
>
> also make sure Open MPI has all the dependencies on all the VMs
>
> ssh xxx ldd `which orted`
>
> should show no missing dependency
>
>
>
> generally speaking, I recommend you configure Open MPI with
>
> --enable-mpirun-prefix-by-default
>
>
>
> you can also try to replace
>
> mpirun
>
> with
>
> `which mpirun`
>
> or
>
> mpirun --prefix <path to Open MPI>
>
>
>
> Cheers,
>
>
>
> Gilles
>
> On Thursday, June 2, 2016, Ping Wang <ping.w...@asc-s.de> wrote:
>
> Hi,
>
> I've installed Open MPI v1.10.2. Every VM on the cloud has two IPs
> (internal IP, public IP).
> When I run: mpirun --host <internal IP> hostname, the output is the
> hostname of the VM.
> But when I run: mpirun --host <public IP> hostname, the output is
>
> bash: orted: command not found
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to
> use.
>
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
>
> Both IPs are the IP of the VM where MPI is running. Did I do something
> wrong in the configuration?
>
> Thanks for any help.
>
> Ping
>
> -----Ursprüngliche Nachricht-----
> Von: users [mailto:users-boun...@open-mpi.org
> <javascript:_e(%7B%7D,'cvml','users-boun...@open-mpi.org');>] Im Auftrag
> von Jeff Squyres (jsquyres)
> Gesendet: Mittwoch, 1. Juni 2016 15:02
> An: Open MPI User's List
> Betreff: Re: [OMPI users] Firewall settings for MPI communication
>
> In addition, you might want to consider upgrading to Open MPI v1.10.x
> (v1.6.x is fairly ancient).
>
> > On Jun 1, 2016, at 7:46 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
> >
> > which network are your VMs using for communications ?
> > if this is tcp, then you also have to specify a restricted set of
> > allowed ports for the tcp btl
> >
> > that would be something like
> > mpirun --mca btl_tcp_dynamic_ports 49990-50010 ...
> >
> > please double check the Open MPI 1.6.5 parameter and syntax with
> > ompi_info --all (or check the archives, I think I posted the correct
> > command line a few weeks ago)
> >
> > Cheers,
> >
> > Gilles
> >
> > On Wednesday, June 1, 2016, Ping Wang <ping.w...@asc-s.de
> <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');>> wrote:
> > I'm using Open MPI 1.6.5 to run OpenFOAM in parallel on several VMs on
> > a cloud. mpirun hangs without any error messages. I think this is a
> > firewall issue. Because when I open all the TCP ports(1-65535) in the
> > security group of VMs, mpirun works well. However I was suggested to
> > open as less ports as possible. So I have to limit MPI to run on a
> > range of ports. I opened the port range 49990-50010 for MPI
> > communication. And use command
> >
> >
> >
> > mpirun --mca oob_tcp_dynamic_ports 49990-50010 -np 4 --hostfile machines
> simpleFoam –parallel.
> >
> >
> >
> > But it still hangs. How can I specify a port range that OpenMPI will
> use? I appreciate any help you can provide.
> >
> >
> >
> > Best,
> >
> > Ping Wang
> >
> >
> >
> > <image001.png>
> >
> > ------------------------------------------------------
> >
> > Ping Wang
> >
> > Automotive Simulation Center Stuttgart e.V.
> >
> > Nobelstraße 15
> >
> > D-70569 Stuttgart
> >
> > Telefon: +49 711 699659-14
> >
> > Fax: +49 711 699659-29
> >
> > E-Mail: ping.w...@asc-s.de
> <javascript:_e(%7B%7D,'cvml','ping.w...@asc-s.de');>
> >
> > Web: http://www.asc-s.de
> >
> > Social Media: <image002.gif>/asc.stuttgart
> >
> > ------------------------------------------------------
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> > http://www.open-mpi.org/community/lists/users/2016/06/29340.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com <javascript:_e(%7B%7D,'cvml','jsquy...@cisco.com');>
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29342.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29349.php
>

Reply via email to