One last note to close this out. After some discussion on the developers list it was pointed out that this problem was fixed with new code in the trunk and 1.3 branch. So my statement below of the trunk, 1.3 and CT8 EA2 supporting nodes on different subnets can be made stronger that we really do expect this to work.

--td
Terry Dontje wrote:
Terry Dontje wrote:

Date: Tue, 29 Jul 2008 14:19:14 -0400
From: "Alexander Shabarshin" <ashabars...@developonbox.com>
Subject: Re: [OMPI users] Communitcation between OpenMPI and
    ClusterTools
To: <us...@open-mpi.org>
Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
    reply-type=response

Hello

>>> > One idea comes to mind is whether the two nodes are on the same >>> > subnet? If they are not on the same subnet I think there is a bug in >>> > which the TCP BTL will recuse itself from communications between the >>> > two nodes.

>> you are right - subnets are different, but routes set up correctly and >> everything like ping, ssh etc. are working OK between them

> But it isn't a routing problem but how the tcp btl in Open MPI decides > which interface the nodes can communicate with (completely out of the > hands of the TCP stack and lower).

Do you know when it can be fixed in official OpenMPI?
Is patch available or something?
Well this problem is captured in ticket 972 (https://svn.open-mpi.org/trac/ompi/ticket/972). There is a question as to whether this ticket has been fixed or not (that is was code actually putback). Sun's experience with the Trunk, 1.3 branch and CT8 EA2 release seems to be that you now can run jobs across subnets but we (Sun) are not completely

I guess I should have ended with "mumble..mumble" :-)
Now for the rest of the sentence:

... sure whether the support is truly in there or we just got lucky in how our setup was configured.

--td
FWIW, it looks like that code has had a lot of changes in it between 1.2 and 1.3.

--td




Reply via email to