Hi John !

I'm experimenting now with a head node and single compute node, all the
rest of cluster is switched off.

can you run :
>
> ibhosts
>

# ibhosts
Ca      : 0x7cfe900300bddec0 ports 1 "MT25408 ConnectX Mellanox
Technologies"
Ca      : 0xe41d2d030050caf0 ports 1 "MT25408 ConnectX Mellanox
Technologies"


>
> ibstat
>

# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 1
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0xe41d2d030050caf0
        System image GUID: 0xe41d2d030050caf3
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 1
                LMC: 0
                SM lid: 3
                Capability mask: 0x0251486a
                Port GUID: 0xe41d2d030050caf1
                Link layer: InfiniBand


>
>
ibdiagnet
>
>
# ibdiagnet
# cat ibdiagnet.log
-W- Topology file is not specified.
    Reports regarding cluster links will use direct routes.
-I- Using port 1 as the local port.
-I- Discovering ... 3 nodes (1 Switches & 2 CA-s) discovered.


-I---------------------------------------------------
-I- Bad Guids/LIDs Info
-I---------------------------------------------------
-I- No bad Guids were found

-I---------------------------------------------------
-I- Links With Logical State = INIT
-I---------------------------------------------------
-I- No bad Links (with logical state = INIT) were found

-I---------------------------------------------------
-I- General Device Info
-I---------------------------------------------------

-I---------------------------------------------------
-I- PM Counters Info
-I---------------------------------------------------
-I- No illegal PM counters values were found

-I---------------------------------------------------
-I- Fabric Partitions Report (see ibdiagnet.pkey for a full hosts list)
-I---------------------------------------------------
-I-    PKey:0x7fff Hosts:2 full:2 limited:0

-I---------------------------------------------------
-I- IPoIB Subnets Check
-I---------------------------------------------------
-I- Subnet: IPv4 PKey:0x7fff QKey:0x00000b1b MTU:2048Byte rate:10Gbps
SL:0x00
-W- Suboptimal rate for group. Lowest member rate:40Gbps > group-rate:10Gbps

-I---------------------------------------------------
-I- Bad Links Info
-I- No bad link were found
-I---------------------------------------------------

-I- Done. Run time was 2 seconds.


>
> Lord help me for being so naive, but do you have a subnet manager running?
>

It seems, yes (I even have standby):

# service --status-all | grep opensm
 [ + ]  opensm

# cat ibdiagnet.sm

ibdiagnet fabric SM report

  SM - master
    MT25408/P1 lid=0x0003 guid=0x7cfe900300bddec1 dev=4099 priority:0

  SM - standby
    The Local Device : MT25408/P1 lid=0x0001 guid=0xe41d2d030050caf1
dev=4099 priority:0

Best regards,
Sergei.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to