Hi John ! I'm experimenting now with a head node and single compute node, all the rest of cluster is switched off.
can you run : > > ibhosts > # ibhosts Ca : 0x7cfe900300bddec0 ports 1 "MT25408 ConnectX Mellanox Technologies" Ca : 0xe41d2d030050caf0 ports 1 "MT25408 ConnectX Mellanox Technologies" > > ibstat > # ibstat CA 'mlx4_0' CA type: MT4099 Number of ports: 1 Firmware version: 2.35.5100 Hardware version: 0 Node GUID: 0xe41d2d030050caf0 System image GUID: 0xe41d2d030050caf3 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 1 LMC: 0 SM lid: 3 Capability mask: 0x0251486a Port GUID: 0xe41d2d030050caf1 Link layer: InfiniBand > > ibdiagnet > > # ibdiagnet # cat ibdiagnet.log -W- Topology file is not specified. Reports regarding cluster links will use direct routes. -I- Using port 1 as the local port. -I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered. -I--------------------------------------------------- -I- Bad Guids/LIDs Info -I--------------------------------------------------- -I- No bad Guids were found -I--------------------------------------------------- -I- Links With Logical State = INIT -I--------------------------------------------------- -I- No bad Links (with logical state = INIT) were found -I--------------------------------------------------- -I- General Device Info -I--------------------------------------------------- -I--------------------------------------------------- -I- PM Counters Info -I--------------------------------------------------- -I- No illegal PM counters values were found -I--------------------------------------------------- -I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list) -I--------------------------------------------------- -I- PKey:0x7fff Hosts:2 full:2 limited:0 -I--------------------------------------------------- -I- IPoIB Subnets Check -I--------------------------------------------------- -I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps -I--------------------------------------------------- -I- Bad Links Info -I- No bad link were found -I--------------------------------------------------- -I- Done. Run time was 2 seconds. > > Lord help me for being so naive, but do you have a subnet manager running? > It seems, yes (I even have standby): # service --status-all | grep opensm [ + ] opensm # cat ibdiagnet.sm ibdiagnet fabric SM report SM - master MT25408/P1 lid=0x0003 guid=0x7cfe900300bddec1 dev=4099 priority:0 SM - standby The Local Device : MT25408/P1 lid=0x0001 guid=0xe41d2d030050caf1 dev=4099 priority:0 Best regards, Sergei.
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users