Hi Pasha, Yevgeny,

>> My educated guess is that from some reason it is no direct connection path
>> between lid-2 and lid-4. To prove it we have to look and the OpenSM routing
>> information.

> If you don't get response or you get info of
> the device different that what you would expect,
> then the two ports are not part of the same
> subnet, and APN is expected to fail.
> Otherwise - it's probably a bug.

I've tried your suggestions and the details are below.  I am now
testing with a trivial MPI application that just does an
MPI_Send/MPI_Recv and then sleeps for a while (attached).  There is
much less output to weed through now!

When I unplug a cable from Port 1, the LID associated with Port 2 is
still reachable with smpquery.  So it looks like there should be a
valid path to migrate to on the same  subnet.

I am using 2 hosts in this output
sulu:  This is the host where I unplug the cable from Port 1. The
cable on Port 2 is connected all the time. LIDs 4 and 5.
bones:  On this host I leave cables connected to both Ports all the
time.LIDs 2 and 3.

A) Before I start, sulu shows that both Ports are up and active using
LIDs 4 and 5:
sulu> ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0033:6fe1
        base lid:        0x4
        sm lid:          0x6
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            56 Gb/sec (4X FDR)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0033:6fe2
        base lid:        0x5
        sm lid:          0x6
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            56 Gb/sec (4X FDR)
        link_layer:      InfiniBand

B) The other host, bones, is able to get to LIDs 4 and 5 OK:
bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4
# Node info: Lid 4
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x0002c90300336fe3
Guid:............................0x0002c90300336fe0
PortGuid:........................0x0002c90300336fe1
PartCap:.........................128
DevId:...........................0x1003
Revision:........................0x00000000
LocalPort:.......................1
VendorId:........................0x0002c9

bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5
# Node info: Lid 5
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x0002c90300336fe3
Guid:............................0x0002c90300336fe0
PortGuid:........................0x0002c90300336fe2
PartCap:.........................128
DevId:...........................0x1003
Revision:........................0x00000000
LocalPort:.......................2
VendorId:........................0x0002c9

C) I start the MPI program.  See attached file for output.

D) During Iteration 3, I unplugged the cable on Port 1 of sulu.
- I get the expected network error event message.
- sulu shows that Port 1 is down and Port 2 is active as expected.
- bones is still able to get to LID 5 on Port 2 of sulu as expected.
- The MPI application hangs and then terminates instead of running via LID 5.

sulu> ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0033:6fe1
        base lid:        0x4
        sm lid:          0x6
        state:           1: DOWN
        phys state:      2: Polling
        rate:            40 Gb/sec (4X QDR)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0002:c903:0033:6fe2
        base lid:        0x5
        sm lid:          0x6
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            56 Gb/sec (4X FDR)
        link_layer:      InfiniBand

bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 4
ibwarn: [11192] mad_rpc: _do_madrpc failed; dport (Lid 4)
smpquery: iberror: failed: operation NodeInfo: node info query failed

bones> smpquery --Ca mlx4_0 --Port 1 NodeInfo 5
# Node info: Lid 5
BaseVers:........................1
ClassVers:.......................1
NodeType:........................Channel Adapter
NumPorts:........................2
SystemGuid:......................0x0002c90300336fe3
Guid:............................0x0002c90300336fe0
PortGuid:........................0x0002c90300336fe2
PartCap:.........................128
DevId:...........................0x1003
Revision:........................0x00000000
LocalPort:.......................2
VendorId:........................0x0002c9

Thanks,

-Jeremy
#include <stdio.h>
#include <mpi.h>
#include <unistd.h>

#define BUFSIZE (1024*1024)

int main(argc, argv)
   int argc;
   char **argv;
{
   int rank = 0;
   int tag = 0;
   MPI_Status recv_status;
   char buffer[BUFSIZE];
   int iteration = 0;

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);

   while (1) {
      if (rank == 0) {
         printf("Iteration %d\n", ++iteration);
         fflush(stdout);
         if (MPI_Send(buffer, BUFSIZE, MPI_CHAR, 1, tag, MPI_COMM_WORLD) !=
             MPI_SUCCESS) {
            printf("ERROR: MPI_Send failed\n");
         }
      }
      else {
         if (MPI_Recv(buffer, BUFSIZE, MPI_CHAR, 0, tag, MPI_COMM_WORLD,
                      &recv_status) != MPI_SUCCESS) {
            printf("ERROR: MPI_Recv failed\n");
         }
      }
      MPI_Barrier(MPI_COMM_WORLD);

#if 1
      if (rank == 0) {
         printf("Sleeping...");
         fflush(stdout);
      }

      sleep(5);

      if (rank == 0) {
         printf("Done\n");
         fflush(stdout);
      }
#endif
   }

   MPI_Finalize();

   printf("[%d] Program Completed!\n", rank);
   return 0;
}

Attachment: mpirun.out
Description: Binary data

Reply via email to