Terry Dontje wrote:
Date: Tue, 29 Jul 2008 14:19:14 -0400
From: "Alexander Shabarshin" <ashabars...@developonbox.com>
Subject: Re: [OMPI users] Communitcation between OpenMPI and
ClusterTools
To: <us...@open-mpi.org>
Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response
Hello
>>> > One idea comes to mind is whether the two nodes are on the
same >>> > subnet? If they are not on the same subnet I think
there is a bug in >>> > which the TCP BTL will recuse itself from
communications between the >>> > two nodes.
>> you are right - subnets are different, but routes set up
correctly and >> everything like ping, ssh etc. are working OK
between them
> But it isn't a routing problem but how the tcp btl in Open MPI
decides > which interface the nodes can communicate with (completely
out of the > hands of the TCP stack and lower).
Do you know when it can be fixed in official OpenMPI?
Is patch available or something?
Well this problem is captured in ticket 972
(https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question
as to whether this ticket has been fixed or not (that is was code
actually putback). Sun's experience with the Trunk, 1.3 branch and
CT8 EA2 release seems to be that you now can run jobs across subnets
but we (Sun) are not completely
I guess I should have ended with "mumble..mumble" :-)
Now for the rest of the sentence:
... sure whether the support is truly in there or we just got lucky in
how our setup was configured.
--td
FWIW, it looks like that code has had a lot of changes in it between
1.2 and 1.3.
--td