Jeff,

Thanks so much for the help and the phone call.

We have built 1.10.7 on CentOS 7 without the need to explicitly turn off either the usnic or verbs-usnic. Everything configured and built correctly out of the box with rdma-core-devel-13-7.el7.x86_64.

We will now hopefully be able to run our regression tests with no outcome changes due to the patch level increment. Depending on that result we will make a decision to update to a newer OpenMPI.

Thanks again for your help and to others on this list who pointed the way clear.

All the best and thanks for all the development too!


On 02/28/2018 03:20 PM, Jeff Squyres (jsquyres) wrote:
I took advantage of the fact that Bill's phone number is in his signature and 
gave him a call (gasp! Talk to someone from the interwebs -- what craziness is 
that?!).

The real issue here is that Open MPI's use of verbs in v1.10.2 pre-dates the 
use of the rdma-core packaging.  Various header file and other changes were 
made in the transition from libiverbs to rdma-core, the Open MPI v1.10.2 simply 
doesn't handle them correctly.  Later versions in the v1.10.x series fix all 
these issues such that both pre-rdma-core libibverbs and post-rdma-core 
libibverbs are handled properly.

Compounding this was the fact that we had another bug in v1.10.2 that it wasn't 
possible to fully disable the common/usnic component.  Sad panda.

Unfortunately, the only way to move forward is to either apply a patch to 
v1.10.2 (in this case, just to disable all the usNIC stuff, since this user is 
not using usNIC at all), move forward to v1.10.7, or move forward to the latest 
Open MPI (both v3.0.1 and v3.1.0 are literally immanently about to be released).



On Feb 28, 2018, at 2:41 PM, William T Jones <w.t.jo...@nasa.gov> wrote:

Thanks for the suggestions.


On 02/28/2018 12:10 PM, Jeff Squyres (jsquyres) wrote:
Oops; it looks like there's 2 chunks of usNIC code in the 1.10.x code base, and 
--without-usnic only disables one of them.
I do believe we fixed that in a later 1.10.x release -- I am guessing you don't 
want to upgrade to v3.0.x for compatibility/testing reasons, but do you think 
you could move to a later v1.10.x release (which should be just bug fixes 
compared to 1.10.2)?
If you can upgrade to v1.10.7, this should work:
./configure --without-usnic --without-verbs-usnic ...

This works.  But will be tough because it will require lots of re-validation 
work with the codes that depend on OpenMPI even though it is only a patch level 
increment.

Sidenote: it's actually also possible that v1.10.7 will either correctly ignore 
or correctly compile usNIC on your system (without any extra command line 
options) -- we may have fixed that bug by v1.10.7; I honestly don't remember 
offhand.
If you can't move to v1.10.7, I think you should be able to use the following 
with v1.10.2 too disable the BTL usNIC component and the verbs_usnic common 
component:
./configure --without-usnic --enable-mca-no-build=common-verbs_usnic ...

Sadly this does not work.  Linker fails with undefined reference to 
`ompi_common_verbs_usnic_register_fake_drivers'.

On Feb 28, 2018, at 11:55 AM, William T Jones <w.t.jo...@nasa.gov> wrote:

Unfortunately, that does not work.

% ./configure --enable-static \
              --with-tm=/usr/local/pkgs/PBSPro_64 \
              --enable-mpi-thread-multiple \
              --with-verbs=/usr \
              --without-usnic \
              --enable-mpi-cxx \
              FC=ifort \
              F77=ifort \
              CC=icc \
              CXX=icpc \
              CFLAGS="-O3 -ip" \
              FCFLAGS="-O3 -ip" \
              LIBS="-lcrypto -lpthread"

...
Making all in mca/common/verbs_usnic
make[2]: Entering directory 
`/misc/home2/wtjones1/GIT/fun3d/misc/module-builder/k/openmpi/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
  CC       libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field "ops"
      .ops = {
       ^

common_verbs_usnic_fake.c(89): warning #266: function "ibv_read_sysfs_file" 
declared implicitly
      if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
          ^

common_verbs_usnic_fake.c(133): warning #266: function "ibv_register_driver" 
declared implicitly
          ibv_register_driver("usnic_verbs", fake_driver_init);
          ^

compilation aborted for common_verbs_usnic_fake.c (code 2)


On 02/28/2018 11:10 AM, r...@open-mpi.org wrote:
Not unless you have a USNIC card in your machine!
On Feb 28, 2018, at 8:08 AM, William T Jones <w.t.jo...@nasa.gov> wrote:

Thank you!

Will that have any adverse side effects?
Performance penalties?

On 02/28/2018 10:57 AM, r...@open-mpi.org wrote:
Add --without-usnic
On Feb 28, 2018, at 7:50 AM, William T Jones <w.t.jo...@nasa.gov> wrote:

I realize that OpenMPI 1.10.2 is quite old, however, for compatibility I
am attempting to compile it after a system upgrade to CentOS 7.

This system does include infiniband and I have configured as follows
using Intel 2017.2.174 compilers:

% ./configure --enable-static \
              --with-tm=/usr/local/pkgs/PBSPro_64 \
              --enable-mpi-thread-multiple \
              --with-verbs=/usr \
              --enable-mpi-cxx \
              FC=ifort \
              F77=ifort \
              CC=icc \
              CXX=icpc \
              CFLAGS="-O3 -ip" \
              FCFLAGS="-O3 -ip" \
              LIBS=-lcrypto -lpthread

However, when I compile I get the following error:

  ...
  Making all in mca/common/verbs_usnic
  make[2]: Entering directory
`/usr/src/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
    CC       libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
  common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field
"ops"
        .ops = {
         ^

  common_verbs_usnic_fake.c(89): warning #266: function
"ibv_read_sysfs_file" declared implicitly
        if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
            ^

  common_verbs_usnic_fake.c(133): warning #266: function
"ibv_register_driver" declared implicitly
            ibv_register_driver("usnic_verbs", fake_driver_init);
            ^

  compilation aborted for common_verbs_usnic_fake.c (code 2)


Unfortunately, my /usr/include/infiniband/verbs.h file defines the
"ibv_device" structure but does not include "ops" member.  Instead the
structure is defined as follows:

  /* Obsolete, never used, do not touch */
  struct _ibv_device_ops {
          struct ibv_context *    (*_dummy1)(struct ibv_device *device,
int cmd_fd);
          void                    (*_dummy2)(struct ibv_context *context);
  };

  enum {
          IBV_SYSFS_NAME_MAX      = 64,
          IBV_SYSFS_PATH_MAX      = 256
  };

  struct ibv_device {
          struct _ibv_device_ops  _ops;
          enum ibv_node_type      node_type;
          enum ibv_transport_type transport_type;
          /* Name of underlying kernel IB device, eg "mthca0" */
          char                    name[IBV_SYSFS_NAME_MAX];
          /* Name of uverbs device, eg "uverbs0" */
          char                    dev_name[IBV_SYSFS_NAME_MAX];
          /* Path to infiniband_verbs class device in sysfs */
          char                    dev_path[IBV_SYSFS_PATH_MAX];
          /* Path to infiniband class device in sysfs */
          char                    ibdev_path[IBV_SYSFS_PATH_MAX];
  };


OpenMPI was previously compiled successfully under CentOS 6 and every
indication is that the /usr/include/infiniband/verbs.h was defined
similarly (again without the "ops" member).

Is it possible that there is a configure option that prevents this source from 
being included in the build?

Any help is appreciated,


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

    Bill Jones                                       w.t.jo...@nasa.gov
    Mail Stop 128                     Computational AeroSciences Branch
    15 Langley Boulevard                           Research Directorate
    NASA Langley Research Center               Building 1268, Room 1044
    Hampton, VA  23681-2199                       Phone +1 757 864-5318
                                                    Fax +1 757 864-8816
                                             http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

    Bill Jones                                       w.t.jo...@nasa.gov
    Mail Stop 128                     Computational AeroSciences Branch
    15 Langley Boulevard                           Research Directorate
    NASA Langley Research Center               Building 1268, Room 1044
    Hampton, VA  23681-2199                       Phone +1 757 864-5318
                                                    Fax +1 757 864-8816
                                             http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

    Bill Jones                                       w.t.jo...@nasa.gov
    Mail Stop 128                     Computational AeroSciences Branch
    15 Langley Boulevard                           Research Directorate
    NASA Langley Research Center               Building 1268, Room 1044
    Hampton, VA  23681-2199                       Phone +1 757 864-5318
                                                    Fax +1 757 864-8816
                                             http://fun3d.larc.nasa.gov
<config.log.gz>_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

    Bill Jones                                       w.t.jo...@nasa.gov
    Mail Stop 128                     Computational AeroSciences Branch
    15 Langley Boulevard                           Research Directorate
    NASA Langley Research Center               Building 1268, Room 1044
    Hampton, VA  23681-2199                       Phone +1 757 864-5318
                                                    Fax +1 757 864-8816
                                             http://fun3d.larc.nasa.gov



--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

    Bill Jones                                       w.t.jo...@nasa.gov
    Mail Stop 128                     Computational AeroSciences Branch
    15 Langley Boulevard                           Research Directorate
    NASA Langley Research Center               Building 1268, Room 1044
    Hampton, VA  23681-2199                       Phone +1 757 864-5318
                                                    Fax +1 757 864-8816
                                             http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to