Re: [OMPI users] Connection timed out with multiple nodes

2014-02-26 Thread Doug Roberts
-- List-Post: users@lists.open-mpi.org Date: Tue, 25 Feb 2014 20:07:31 -0500 (EST) From: Doug Roberts To: us...@open-mpi.org Subject: Re: [OMPI users] Connection timed out with multiple nodes Hello again, The "oob_stress" program runs cleanly on each of the two test nodes bro127 and bro12

Re: [OMPI users] Connection timed out with multiple nodes

2014-02-25 Thread Doug Roberts
Hello again, The "oob_stress" program runs cleanly on each of the two test nodes bro127 and bro128 as shown below. Would you say this rules out a problem with the network and switch, or is there another test program(s) that should be run next ? o eth0 and eth2: without plm_base_verbose [roberp

Re: [OMPI users] Connection timed out with multiple nodes

2014-01-31 Thread Doug Roberts
It's the failure on readv that's the source of the trouble. What happens if you only if_include eth2? Does it work then? Still hangs, details follow ... tx! o Using only eth2 with verbosity gives: [roberpj@bro127:~/samples/mpi_test] /opt/sharcnet/openmpi/1.6.5/intel-debug/bin/mpirun -np 2 -

Re: [OMPI users] Connection timed out with multiple nodes

2014-01-23 Thread Doug Roberts
Date: Fri, 17 Jan 2014 19:24:50 -0800 From: Ralph Castain The most common cause of this problem is a firewall between the nodes - you can ssh across, but not communicate. Have you checked to see that the firewall is turned off? Turns out some iptables rules (typical on our clusters) were act

[OMPI users] Connection timed out with multiple nodes

2014-01-17 Thread Doug Roberts
1) When openmpi programs run across multiple nodes they hang rather quickly as shown in the mpi_test example below. Note that I am assuming the initital topology error message is a separate issue since single node openmpi jobs run just fine. [roberpj@bro127:~/samples/mpi_test] /opt/sharcnet/op

Re: [OMPI users] Code Seg Faults in Devel Series

2008-06-30 Thread Doug Roberts
*** [bru34:06039] -- mpirun noticed that process rank 0 with PID 30651 on node bru25 exited on signal 11 (Segmentation fault). -- On Fri, 27 Jun 2008, Doug

[OMPI users] Processes Not Restarting On Requested Hosts

2008-06-28 Thread Doug Roberts
Using 1.3a1r18423 built against blcr 0.7.1 but cannot get ompi-restart to start processes on machinefile hosts. No useful information shown in /var/log/messages on master or intended hosts. Disabling prelinking doesnt help either. ie) All processes start on the host which ompi-restart is execute

[OMPI users] Code Seg Faults in Devel Series

2008-06-27 Thread Doug Roberts
Hi, I am trying to use the latest release of v1.3 to test with BLCR however i just noticed that sometime after 1.3a1r18423 the standard mpich sample code (cpi.c) stopped working on our rel4 based myrinet gm clusters which raises some concern. Please find attached: gm_board_info.out, ompi_info--a

[OMPI users] openmpi-1.1a9r10157 Fails to build with Nag f95 Compiler

2006-06-03 Thread Doug Roberts
Thanks. Adding FCFLAGS="-mismatch -w" allowed openmpi-1.1a9r10177 to build this time and i am able to run simple test problems on the cluster. However, I am unable to run example problems that come with the Nag Parallel library which we also have in addition to the Nag f95 compiler. So I just

[OMPI users] openmpi-1.1a9r10157 Fails to build with Nag f95 Compiler

2006-06-02 Thread Doug Roberts
istake); the fix should show up in the snapshot tarballs tonight. -- Forwarded message -- List-Post: users@lists.open-mpi.org Date: Thu, 1 Jun 2006 16:45:50 -0400 (EDT) From: Doug Roberts To: us...@open-mpi.org Subject: openmpi-1.1a9r10157 Fails to build with Nag f95 Compiler Hi

[OMPI users] openmpi-1.1a9r10157 Fails to build with Nag f95 Compiler

2006-06-01 Thread Doug Roberts
Hi, I'm using the NAGWare Fortran 95 compiler Release 5.0(414), but make fails as shown in the snippet below. I've attached the config.log, config.out and make.out files. The system is a dual processor Opteron server running a 2.6 x86_64 linux kernel and has a myrinet mx based interconnect which