On May 10, 2014, at 12:02 AM, Wijnberg, Tom <t...@metrohm-applikon.com> wrote:
> Hi Ralph, > > Thank you for your reply. My problem does sound a bit like the bug you are > describing however I'm not quite sure yet. I have implemented the exact same > setup between 2 virtual machines and in that setup everything runs correctly. > I did test if the local firewall was the problem but no luck. I'm uncertain > if perhaps the local admin also is limiting traffic within the network > through a firewall however I find that unlikely, but I will ask him on > Monday. It did occur to me that perhaps the port forwarding is not setup > correctly. I have forwarded port 22 from the virtualbox host to the virtual > machine but perhaps openmpi requires more than just this port? The need to > have your firewall not block TCP connections between pc's does seem to > indicate this. This is likely to be the problem, then. OMPI requires several ports: (a) one port for the daemons to connect back to mpirun, and (b) at least one port for the TCP BTL. Your best solution is to remove the firewall on the virtual machine completely, but if you must do it with port forwarding, you can open up some specific ports and pass them to OMPI. For example, if you open up port 1234 and ports 6789-6792: -mca oob_tcp_static_ipv4_ports 1234-2000 -mca btl_tcp_port_min_v4 6789 -mca btl_tcp_port_range_v4 3 Note that you don't have to open up all the oob tcp ports as only the daemon needs to connect back thru the firewall. However, once you specify static ports, you have to provide enough entries for all the local procs to connect back to the daemon - we don't currently offer the option of defining static ports only for the daemons while leaving the app procs dynamic. Per ompi_info: MCA oob: parameter "oob_tcp_static_ipv4_ports" (current value: "", data source: default, level: 9 dev/all, type: string) Static ports for daemons and procs (IPv4) MCA btl: parameter "btl_tcp_port_min_v4" (current value: "1024", data source: default, level: 2 user/detail, type: int) The minimum port where the TCP BTL will try to bind (default 1024) MCA btl: parameter "btl_tcp_port_range_v4" (current value: "64511", data source: default, level: 2 user/detail, type: int) The number of ports where the TCP BTL will try to bind (default 64511). This parameter together with the port min, define a range of ports where Open MPI will open sockets. > > As for the PATH and LD_LIBRARY_PATH how can I check if these are set > correctly. When I login into the slave pc I'm able to use mpirun locally > without the need to set any variables. To me this would seem to indicate that > the problem is not related to the PATH or LD_LIBRARY_PATH. However when I try > and add the master to the hosts file (so using them the wrong way around) I > get the exact same behavior as observed before. Check out the FAQ - login executes a different default setup script than remote execution: http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path You might also be able to add --enable-orterun-prefix-by-default to your configure line if the install location is the same on both machines. > > Currently I'm leaning towards a problem with port forwarding however I can't > find information of openmpi requires more than just port 22 to work. > > Regards, > TWij > > -- > Metrohm Applikon B.V. > De Brauwweg 13 > 3125 AE Schiedam > The NetherlandsLIBRARY_PATH > Phone: +31 (0)10 298 3555 > Direct: +31 (0)10 298 3579 > > DISCLAIMER: > This e-mail and any attachment sent with it are intended exclusively for the > addressee(s), and may not be passed on to, or made available for use by any > person other than the addressee(s). Any and every liability resulting from > any electronic transmission is ruled out. > If you are not the intended recipient, please contact the sender by reply > email and destroy all copies of the original message. > > > -----Original Message----- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain > Sent: vrijdag 9 mei 2014 15:46 > To: Open MPI Users > Subject: Re: [OMPI users] No output when adding host to hostfile > > There is a known bug in the 1.8.1 release whereby daemons failing to start on > a remote node will cause a silent failure. This has been fixed for the > upcoming 1.8.2 release, but you might want to use one of the nightly 1.8.2 > snapshots in the interim. > > Most likely causes: > > * not finding the required libraries on the remote node because the default > PATH and LD_LIBRARY_PATH aren't setup correctly > > * firewall preventing TCP connections between the machines > > Ralph > > > On May 9, 2014, at 5:30 AM, Wijnberg, Tom <t...@metrohm-applikon.com> wrote: > >> Hi, >> >> I have encountered a problem with openmpi I can't seem to be able to >> diagnose or find precedence in in the mailing-list. I have two pc's >> with a fresh install of Arch linux and openmpi 1.8.1. One is a >> dedicated PC and the other is a virtualbox installation. The >> virtualbox install is the master and I'm able to use mpirun without a >> problem (compiled a small program that prints to stdout). In and output are >> as follows: >> >>> $ mpirun -n 4 -hostfile mpiHosts myprogram hello MPI user: from >>> process = 1 on machine=vArch, of NCPU=4 processes hello MPI user: >>> from process = 0 on machine=vArch, of NCPU=4 processes hello MPI >>> user: from process = 2 on machine=vArch, of NCPU=4 processes hello >>> MPI user: from process = 3 on machine=vArch, of NCPU=4 >> processes >> >> Running programs on a single machine is not a problem. Also I'm able >> to log into both machines using ssh without the need for a password so >> communication between the machines should be oke. However when I add >> the second host to the hostfile the I get no more feedback. What I >> mean with this is that I get the following. >> >>> $ echo "10.5.10.224 slots=4" >> mpiHosts $ mpirun -n 8 -hostfile $ >>> mpirun -n 4 -hostfile mpiHosts myprogram >>> >> >> No output is returned. I'm not sure if this is intended behavior but >> it seems incorrect to me. Can anyone provide me with some insight as >> to why I'm observing this en how I can diagnose the problem. >> >> Regards, >> TWij >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users