Hi: A customer is attempting to run our OpenMPI 1.4.2-based application
on a cluster of machines running RHEL4 with the standard OFED stack. The
HCAs are identified as:

03:01.0 PCI bridge: Mellanox Technologies MT23108 PCI Bridge (rev a1)
04:00.0 InfiniBand: Mellanox Technologies MT23108 InfiniHost (rev a1)

ibv_devinfo says that one port on the HCAs is active but the other is
down:

hca_id: mthca0
        fw_ver:                         3.0.2
        node_guid:                      0006:6a00:9800:4c78
        sys_image_guid:                 0006:6a00:9800:4c78
        vendor_id:                      0x066a
        vendor_part_id:                 23108
        hw_ver:                         0xA1
        phys_port_cnt:                  2
                port:   1
                        state:                  active (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 1
                        port_lid:               26
                        port_lmc:               0x00

                port:   2
                        state:                  down (1)
                        max_mtu:                2048 (4)
                        active_mtu:             512 (2)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00


 When the OMPI application is run, it prints the error message:

--------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
create an internal queue.  This typically indicates a failed
OpenFabrics installation, faulty hardware, or that Open MPI is
attempting to use a feature that is not supported on your hardware
(i.e., is a shared receive queue specified in the
btl_openib_receive_queues MCA parameter with a device that does not
support it?).  The failure occured here:

  Local host:  machine001.lan
  OMPI
source: /software/openmpi-1.4.2/ompi/mca/btl/openib/btl_openib.c:250
  Function:    ibv_create_srq()
  Error:       Invalid argument (errno=22)
  Device:      mthca0

You may need to consult with your system administrator to get this
problem fixed.
--------------------------------------------------------------------

The full log of a run with "btl_openib_verbose 1" is attached. My
application appears to run to completion, but I can't tell if it's just
running on TCP and not using the IB hardware.

I would appreciate any suggestions on how to proceed to fix this error.

Thanks,
Allen

-- 
Allen Barnett
Transpire, Inc
E-Mail: al...@transpireinc.com

Attachment: openib.listing.gz
Description: GNU Zip compressed data

Reply via email to