Hi Pasha, Yevgeny, >> My educated guess is that from some reason it is no direct connection path >> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing >> information.
> If you don't get response or you get info of > the device different that what you would expect, > then the two ports are not part of the same > subnet, and APN is expected to fail. > Otherwise - it's probably a bug. I've tried your suggestions and the details are below. I am now testing with a trivial MPI application that just does an MPI_Send/MPI_Recv and then sleeps for a while (attached). There is much less output to weed through now! When I unplug a cable from Port 1, the LID associated with Port 2 is still reachable with smpquery. So it looks like there should be a valid path to migrate to on the same subnet. I am using 2 hosts in this output sulu: This is the host where I unplug the cable from Port 1. The cable on Port 2 is connected all the time. LIDs 4 and 5. bones: On this host I leave cables connected to both Ports all the time.LIDs 2 and 3. A) Before I start, sulu shows that both Ports are up and active using LIDs 4 and 5: sulu> ibstatus Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0002:c903:0033:6fe1 base lid: 0x4 sm lid: 0x6 state: 4: ACTIVE phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand Infiniband device 'mlx4_0' port 2 status: default gid: fe80:0000:0000:0000:0002:c903:0033:6fe2 base lid: 0x5 sm lid: 0x6 state: 4: ACTIVE phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand B) The other host, bones, is able to get to LIDs 4 and 5 OK: bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4 # Node info: Lid 4 BaseVers:........................1 ClassVers:.......................1 NodeType:........................Channel Adapter NumPorts:........................2 SystemGuid:......................0x0002c90300336fe3 Guid:............................0x0002c90300336fe0 PortGuid:........................0x0002c90300336fe1 PartCap:.........................128 DevId:...........................0x1003 Revision:........................0x00000000 LocalPort:.......................1 VendorId:........................0x0002c9 bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5 # Node info: Lid 5 BaseVers:........................1 ClassVers:.......................1 NodeType:........................Channel Adapter NumPorts:........................2 SystemGuid:......................0x0002c90300336fe3 Guid:............................0x0002c90300336fe0 PortGuid:........................0x0002c90300336fe2 PartCap:.........................128 DevId:...........................0x1003 Revision:........................0x00000000 LocalPort:.......................2 VendorId:........................0x0002c9 C) I start the MPI program. See attached file for output. D) During Iteration 3, I unplugged the cable on Port 1 of sulu. - I get the expected network error event message. - sulu shows that Port 1 is down and Port 2 is active as expected. - bones is still able to get to LID 5 on Port 2 of sulu as expected. - The MPI application hangs and then terminates instead of running via LID 5. sulu> ibstatus Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0002:c903:0033:6fe1 base lid: 0x4 sm lid: 0x6 state: 1: DOWN phys state: 2: Polling rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand Infiniband device 'mlx4_0' port 2 status: default gid: fe80:0000:0000:0000:0002:c903:0033:6fe2 base lid: 0x5 sm lid: 0x6 state: 4: ACTIVE phys state: 5: LinkUp rate: 56 Gb/sec (4X FDR) link_layer: InfiniBand bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4 ibwarn: [11192] mad_rpc: _do_madrpc failed; dport (Lid 4) smpquery: iberror: failed: operation NodeInfo: node info query failed bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5 # Node info: Lid 5 BaseVers:........................1 ClassVers:.......................1 NodeType:........................Channel Adapter NumPorts:........................2 SystemGuid:......................0x0002c90300336fe3 Guid:............................0x0002c90300336fe0 PortGuid:........................0x0002c90300336fe2 PartCap:.........................128 DevId:...........................0x1003 Revision:........................0x00000000 LocalPort:.......................2 VendorId:........................0x0002c9 Thanks, -Jeremy
#include <stdio.h> #include <mpi.h> #include <unistd.h> #define BUFSIZE (1024*1024) int main(argc, argv) int argc; char **argv; { int rank = 0; int tag = 0; MPI_Status recv_status; char buffer[BUFSIZE]; int iteration = 0; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); while (1) { if (rank == 0) { printf("Iteration %d\n", ++iteration); fflush(stdout); if (MPI_Send(buffer, BUFSIZE, MPI_CHAR, 1, tag, MPI_COMM_WORLD) != MPI_SUCCESS) { printf("ERROR: MPI_Send failed\n"); } } else { if (MPI_Recv(buffer, BUFSIZE, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &recv_status) != MPI_SUCCESS) { printf("ERROR: MPI_Recv failed\n"); } } MPI_Barrier(MPI_COMM_WORLD); #if 1 if (rank == 0) { printf("Sleeping..."); fflush(stdout); } sleep(5); if (rank == 0) { printf("Done\n"); fflush(stdout); } #endif } MPI_Finalize(); printf("[%d] Program Completed!\n", rank); return 0; }
mpirun.out
Description: Binary data