Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread Gilles Gouaillardet
at first I recommend you test 7 cases - one network only (3 cases) - two networks ony (3 cases) - three networks (1 case) and see when things hang you might also want to mpirun --mca oob_tcp_if_include 10.1.10.0/24 ... to ensure no hang will happen in oob as usual, double check no firewall is ru

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread dpchoudh .
Hello Gilles Thanks for your prompt follow up. It looks this this issue is somehow specific to the Broadcom NIC. If I take it out, the rest of them work in any combination. On further investigation, I found that the name that 'ifconfig' shows for this intterface is different from what it is named

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread Gilles Gouaillardet
iirc, ompi internally uses networks and not interface names. what did you use in your tests ? can you try with networks ? Cheers, Gilles On Saturday, May 14, 2016, dpchoudh . wrote: > Hello Gilles > > Thanks for your prompt follow up. It looks this this issue is somehow > specific to the Broad

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread dpchoudh .
No, I used IP addresses in all my tests. What I found that if I used the IP address of the Broadcom NIC in hostfile and used that network exclusively (btl_tcp_if_include), the mpirun command hung silently. If I used the IP address of another NIC in the host file (and Broadcom NIC exclusively), mpir

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread Jeff Squyres (jsquyres)
You might want to try a pure TCP benchmark across this problematic NIC (e.g., NetpipeTCP or iperf). That will take MPI out of the equation and see if you are able to pass TCP traffic correctly. Make sure to test sizes both smaller and larger than your MTU. > On May 14, 2016, at 1:25 AM, dpch

[OMPI users] Building vs packaging

2016-05-14 Thread Rob Malpass
Hi all I posted about a fortnight ago to this list as I was having some trouble getting my nodes to be controlled by my master node. Perceived wisdom at the time was to compile with the -enable-orterun-prefix-by-default. For some time I'd been getting cannot open libopen-rte.so.7 which poin

Re: [OMPI users] Building vs packaging

2016-05-14 Thread Gilles Gouaillardet
Rob, I do not know how Debian packaged openmpi, and they should be asked instead of openmpi. an other option to get things work is to add the path to openmpi libraries in the ld conf. for example, append /opt/openmpi/lib to /etc/ld.so.conf (or into a new file called /etc/ld.so.conf.d/openmpi, tha

[OMPI users] Mpirun invocation only works in debug mode, hangs in "normal" mode.

2016-05-14 Thread Andrew Reid
Hi all -- I am having a weird problem on a cluster of Raspberry Pi model 2 machines running the Debian/Raspbian version of OpenMPI, 1.6.5. I apologize for the length of this message, but I am trying to include all the pertinent details, but of course can't reliably discriminate between pertinent

Re: [OMPI users] Mpirun invocation only works in debug mode, hangs in "normal" mode.

2016-05-14 Thread Andrew Reid
I think I might have fixed this, but I still don't really understand it. In setting up the RPi machines, I followed a config guide that suggested switching the SSH service in systemd to "ssh.socket" instead of "ssh.service". It's supposed to be lighter weight and get you cleaner shut-downs, and I'

Re: [OMPI users] slot problem on "SUSE Linux, Enterprise Server 12 (x86_64)"

2016-05-14 Thread Ralph Castain
> On May 7, 2016, at 1:13 AM, Siegmar Gross > wrote: > > Hi, > > yesterday I installed openmpi-v1.10.2-176-g9d45e07 on my "SUSE Linux > Enterprise Server 12 (x86_64)" with Sun C 5.13 and gcc-5.3.0. The > following programs don't run anymore. > > > loki hello_2 112 ompi_info | grep -e "OPAL