Maybe the distribution tar ball at

https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.gz

did not get refreshed after the fix in

https://github.com/bosilca/ompi/commit/b902cd5eb765ada57f06c75048509d0716953549

was implemented? I downloaded the tarball from open-mpi.org today, 22
Dec, and compiled and I get the warnings.

ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xd8000000002
valid_mask = 0x1)
[bn01][[37143,17005],0][btl_openib_component.c:1670:init_one_device]
error obtaining device attributes for mlx4_0 errno says Invalid
argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xd8100000002
valid_mask = 0x1)
[bn01][[37143,17005],1][btl_openib_component.c:1670:init_one_device]
error obtaining device attributes for mlx4_0 errno says Invalid
argument
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   bn01
  Local device: mlx4_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

  Local host:   bn01
  Local device: mlx4_0
--------------------------------------------------------------------------

It looks like Howard merged the fix on Dec 4, but the date listed for
the 3.1.3 tarball on the open-mpi.org site is in Oct.

Relevant lines in opal/mca/btl/openib/btl_openib_component.c from the
tar ball are these.  Missing the

    memset(&device->ib_exp_dev_attr, 0, sizeof(device->ib_exp_dev_attr));

that should have been inserted at 1667.

1666 #if HAVE_DECL_IBV_EXP_QUERY_DEVICE
1667     device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
1668     if(ibv_exp_query_device(device->ib_dev_context,
&device->ib_exp_dev_att     r)){
1669         BTL_ERROR(("error obtaining device attributes for %s
errno says %s"     ,
1670                     ibv_get_device_name(device->ib_dev), strerror(errno)));
1671         goto error;
1672     }
1673 #endif

I added a comment to the GitHub issue, but it was closed and I am not
sure that will be noticed.  Sorry for the double-posting if that was
sufficient.

-- bennet
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to