On Sep 18, 2015, at 7:26 PM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com> wrote:
> 
> I built a similar environment with master and private ip and that does not 
> work.
> my understanding is each tasks has two tcp btl (one per interface),
> and there is currently no mechanism to tell that a node is unreachable
> via a given btl.
> (a btl picks the "best" interface for each node, but it never picks zero 
> interface)

Mmm... yes.  I'll bet you're right.

> in order to support this, we should add extra checks to ensure the best 
> interface is reachable
> (that could be achieved "statically" by parsing the routing tables, or 
> "dynamically" by "pinging" the remote interface)

The usNIC BTL does this by doing ARP lookups using libnl (or libnl3).

> On master, there is a reachable framework. Could/should the tcp btl simply 
> use it ?

This is the framework that Ralph and I discussed -- he was going to take some 
of the ideas that the usnic BTL uses and put it in common functionality for 
exactly this kind of thing (e.g., so that the TCP BTL could use it).  I haven't 
looked at it, though, and don't know what the current status of it is.  Ralph?

There's basically one access function in reachable.h:

----
/* Given a list of local interfaces and a list of remote
 * interfaces, return the interface that is the "best"
 * for connecting to the remote process.
 *
 * local_if: list of local opal_if_t interfaces
 * remote_if: list of opal_if_t interfaces for the remote
 *            process
 *
 * return value: pointer to opal_if_t on local_if that is
 *               the "best" option for connecting. NULL
 *               indicates that the remote process cannot
 *               be reached on any interface
 */
typedef opal_if_t*
(*opal_reachable_base_module_reachable_fn_t)(opal_list_t *local_if,
                                             opal_list_t *remote_if);

-----

That sounds just about perfect for getting rid of heuristics in the TCP BTL and 
replacing them with "yes, you can actually reach A from B."

Do you want to take a crack at a PR for master for using the reachable 
framework in the TCP BTL?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to