Re: [OMPI users] Connection timed out with multiple nodes

2014-01-31 Thread Ralph Castain
The only relevant parts are from the application procs - orterun and the orted don't participate in this exchange and never see the BTLs anyway. It looks like there is just something blocking data transfer across eth2 for some reason. I'm afraid I have no idea why - can you run a standard (i.e.,

Re: [OMPI users] Compiling OpenMPI with PGI pgc++

2014-01-31 Thread Reuti
Hi, Am 31.01.2014 um 18:59 schrieb Jiri Kraus: > Thanks for taking a look. I just learned from PGI that this is a known bug > that will be fixed in the 14.2 release (Februrary 2014). Will `pgc++` then link to `gcc` or `pgcc`? When I get it right, it should be a feature of `pgc++` to be link co

Re: [OMPI users] writev error: Bad address

2014-01-31 Thread Reuti
Am 31.01.2014 um 22:08 schrieb Ross Boylan: > I am getting the following error, amidst many successful message sends: > [n10][[50048,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:118:mca_btl_tcp_frag_send] > mca_btl_tcp_frag_send: writev error (0x7f6155970038, 578659815) > Bad a

Re: [OMPI users] Connection timed out with multiple nodes

2014-01-31 Thread Doug Roberts
It's the failure on readv that's the source of the trouble. What happens if you only if_include eth2? Does it work then? Still hangs, details follow ... tx! o Using only eth2 with verbosity gives: [roberpj@bro127:~/samples/mpi_test] /opt/sharcnet/openmpi/1.6.5/intel-debug/bin/mpirun -np 2 -

[OMPI users] writev error: Bad address

2014-01-31 Thread Ross Boylan
I am getting the following error, amidst many successful message sends: [n10][[50048,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:118:mca_btl_tcp_frag_send] mca_btl_tcp_frag_send: writev error (0x7f6155970038, 578659815) Bad address(1) Any ideas about what is going on or what

Re: [OMPI users] openmpi 1.7.4rc1 and f08 interface

2014-01-31 Thread Åke Sandgren
On 01/28/2014 08:26 PM, Jeff Squyres (jsquyres) wrote: Ok, will do. Yesterday, I put in a temporary behavioral test in configure that will exclude ekopath 5.0 in 1.7.4. We'll remove this behavioral test once OMPI fixes the bug correctly (for 1.7.5). I'm not 100% sure yet (my F2k3 spec is at

Re: [OMPI users] Compiling OpenMPI with PGI pgc++

2014-01-31 Thread Jiri Kraus
Hi, Thanks for taking a look. I just learned from PGI that this is a known bug that will be fixed in the 14.2 release (Februrary 2014). Thanks Jiri > -Original Message- > Date: Wed, 29 Jan 2014 18:12:46 + > From: "Jeff Squyres (jsquyres)" > To: Open MPI Users > Subject: Re: [OMP

Re: [OMPI users] Can't build openmpi-1.6.5 with latest FCA 2.5 release.

2014-01-31 Thread Mike Dubman
Hi, Can it be that libibmad/libibumad installed on your system belongs to previous mofed installation? Thanks M. On Jan 31, 2014 2:02 AM, "Brock Palen" wrote: > I grabbed the latest FCA release from Mellnox's website. We have been > building against FCA 2.5 for a while, but it never worked righ

Re: [OMPI users] Implementation of TCP v/s OpenIB (Eager and Rendezvous) protocols

2014-01-31 Thread Siddhartha Jana
Sorry for the typo: ** I was hoping to understand the impact of OpenMPI's implementation of these protocols using traditional TCP. This is the paper I was referring to: Woodall, et al., "High Performance RDMA Protocols in HPC". On 31 January 2014 00:43, Siddhartha Jana wrote: > Good evening >

[OMPI users] Implementation of TCP v/s OpenIB (Eager and Rendezvous) protocols

2014-01-31 Thread Siddhartha Jana
Good evening Is there any documentation describing the difference in MPI-level implementation of the eager and rendezvous protocols in OpenIB BTL versus TCP BTL ? I am only aware of only the following paper. While this presents an excellent overview of how RDMA capabilities of modern interconnects