Re: [OMPI users] Connection timed out on TCP

2014-04-29 Thread Jeff Squyres (jsquyres)
On Apr 29, 2014, at 4:28 PM, Vince Grimes wrote: > I realize it is no longer in the history of replies for this message, but the > reason I am trying to use tcp instead of Infiniband is because: > > We are using an in-house program called ScalIT that performs operations on > very large sparse

Re: [OMPI users] Connection timed out on TCP

2014-04-29 Thread Vince Grimes
M, users-requ...@open-mpi.org wrote: -- Message: 2 Date: Mon, 28 Apr 2014 22:07:08 + From: "Jeff Squyres (jsquyres)" To: Open MPI Users Subject: Re: [OMPI users] Connection timed out on TCP Message-ID: Content-Type: text/plain; charset="us-ascii" In principle, ther

Re: [OMPI users] Connection timed out on TCP

2014-04-28 Thread Jeff Squyres (jsquyres)
gt; Lubbock, TX 79409-1061 > > (806) 834-0813 (voice); (806) 742-1289 (fax) > > On 04/25/2014 04:22 PM, users-requ...@open-mpi.org wrote: > >> Message: 3 >> Date: Fri, 25 Apr 2014 14:56:47 -0500 >> From: Vince Grimes >> To: >> Subject: [OMPI us

Re: [OMPI users] Connection timed out on TCP

2014-04-28 Thread Vince Grimes
Apr 2014 14:56:47 -0500 From: Vince Grimes To: Subject: [OMPI users] Connection timed out on TCP Message-ID: <535abdff.1020...@ttu.edu> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed There is no firewall on this subnet as it is the internal Ethernet for the cluster.

[OMPI users] Connection timed out on TCP

2014-04-25 Thread Vince Grimes
From: Ralph Castain To: Open MPI Users Subject: Re: [OMPI users] Connection timed out on TCP and notify question Message-ID: <11462b85-83ca-4b3d-86e5-eddd9bc87...@open-mpi.org> Content-Type: text/plain; charset=us-ascii Sounds like either a routing problem or a firewall. Are there multiple NI

Re: [OMPI users] Connection timed out on TCP and notify question

2014-04-24 Thread Ralph Castain
Sounds like either a routing problem or a firewall. Are there multiple NICs on these nodes? Looking at the quoted NIC in your error message, is that the correct subnet we should be using? Have you checked to ensure no firewalls exist on that subnet between the nodes? On Apr 24, 2014, at 8:41 A

[OMPI users] Connection timed out on TCP and notify question

2014-04-24 Thread Vince Grimes
Dear all: In the ongoing investigation into why a particular in-house program is not working in parallel over multiple nodes using OpenMPI, running with "--mca btl self,sm,tcp" I have been running into the following error: [compute-6-15.local][[8185,1],0 [btl_tcp_endpoint.c:653:mca_btl_tcp_