--
List-Post: users@lists.open-mpi.org
Date: Tue, 25 Feb 2014 20:07:31 -0500 (EST)
From: Doug Roberts
To: us...@open-mpi.org
Subject: Re: [OMPI users] Connection timed out with multiple nodes
Hello again, The "oob_stress" program runs cleanly on each of
the two test nodes bro127 and bro12
Hello again, The "oob_stress" program runs cleanly on each of
the two test nodes bro127 and bro128 as shown below. Would
you say this rules out a problem with the network and switch,
or is there another test program(s) that should be run next ?
o eth0 and eth2: without plm_base_verbose
[roberp
It's the failure on readv that's the source of the trouble. What
happens if you only if_include eth2? Does it work then?
Still hangs, details follow ... tx!
o Using only eth2 with verbosity gives:
[roberpj@bro127:~/samples/mpi_test]
/opt/sharcnet/openmpi/1.6.5/intel-debug/bin/mpirun -np 2 -
Date: Fri, 17 Jan 2014 19:24:50 -0800
From: Ralph Castain
The most common cause of this problem is a firewall between the
nodes - you can ssh across, but not communicate. Have you checked
to see that the firewall is turned off?
Turns out some iptables rules (typical on our clusters) were act
1) When openmpi programs run across multiple nodes they hang
rather quickly as shown in the mpi_test example below. Note
that I am assuming the initital topology error message is a
separate issue since single node openmpi jobs run just fine.
[roberpj@bro127:~/samples/mpi_test]
/opt/sharcnet/op
***
[bru34:06039]
--
mpirun noticed that process rank 0 with PID 30651 on node bru25 exited on
signal 11 (Segmentation fault).
--
On Fri, 27 Jun 2008, Doug
Using 1.3a1r18423 built against blcr 0.7.1 but cannot get
ompi-restart to start processes on machinefile hosts. No
useful information shown in /var/log/messages on master or
intended hosts. Disabling prelinking doesnt help either.
ie) All processes start on the host which ompi-restart
is execute
Hi, I am trying to use the latest release of v1.3 to test with BLCR
however i just noticed that sometime after 1.3a1r18423 the standard
mpich sample code (cpi.c) stopped working on our rel4 based myrinet
gm clusters which raises some concern.
Please find attached: gm_board_info.out, ompi_info--a
Thanks. Adding FCFLAGS="-mismatch -w" allowed openmpi-1.1a9r10177
to build this time and i am able to run simple test problems on the
cluster. However, I am unable to run example problems that come with
the Nag Parallel library which we also have in addition to the Nag
f95 compiler. So I just
istake); the fix should show up
in the snapshot tarballs tonight.
-- Forwarded message --
List-Post: users@lists.open-mpi.org
Date: Thu, 1 Jun 2006 16:45:50 -0400 (EDT)
From: Doug Roberts
To: us...@open-mpi.org
Subject: openmpi-1.1a9r10157 Fails to build with Nag f95 Compiler
Hi
Hi, I'm using the NAGWare Fortran 95 compiler Release 5.0(414), but make
fails as shown in the snippet below. I've attached the config.log,
config.out and make.out files. The system is a dual processor Opteron
server running a 2.6 x86_64 linux kernel and has a myrinet mx based
interconnect which
11 matches
Mail list logo