Just to brainstorm on this a little - the two different clusters will have different "mapper IDs", and this can be learned via the attached code snippet. As long as fma is the mapper (as opposed the the older, deprecated "gm_mapper" or "mx_mapper"), then Myrinet topology rules ensure that NIC 0, port 0 is all you need to examine. All nodes with the same mapper can then be considered "on the same fabric"

Except, of course, when you have two fabrics A and B with many nodes each but only one node in common - then, all will have the same mapper ID, but are effectively two disjoint fabrics. This is rare, but i have seen it once.

Perhaps a more general solution is for the MX MTL to look in the MX peer table for a requested peer (or simply try mx_connect() and notice it fails?) and report "cannot reach" back up the chain and have higher level code retry with a different medium on a per-peer basis? This would be independent of IB or MX or ...

===================================
#include <stdio.h>
#include <stdlib.h>
#include "myriexpress.h"
#include "mx_io.h"

main()
{
 mx_return_t ret;
 mx_endpt_handle_t h;
 mx_mapper_state_t ms;
 int board = 0;                /* whichever board you want */

 mx_init();
 ret = mx_open_board(board, &h);
 if (ret != MX_SUCCESS) {
   fprintf(stderr, "Unable to open board %d\n", board);
   exit(1);
 }

 ms.board_number = board;
 ms.iport = 0;
 ret = mx__get_mapper_state(h, &ms);
 if (ret != MX_SUCCESS) {
   fprintf(stderr, "get_mapper_state failed for board %d: %s\n",
       board, mx_strerror(ret));
   exit(1);
 }

 printf("mapper = %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x\n",
        ms.mapper_mac[0] & 0xff, ms.mapper_mac[1] & 0xff,
        ms.mapper_mac[2] & 0xff, ms.mapper_mac[3] & 0xff,
        ms.mapper_mac[4] & 0xff, ms.mapper_mac[5] & 0xff);
 exit(0);
}


Reply via email to