Jeff,
Thanks so much for the help and the phone call.
We have built 1.10.7 on CentOS 7 without the need to explicitly turn off
either the usnic or verbs-usnic. Everything configured and built
correctly out of the box with rdma-core-devel-13-7.el7.x86_64.
We will now hopefully be able to run our regression tests with no
outcome changes due to the patch level increment. Depending on that
result we will make a decision to update to a newer OpenMPI.
Thanks again for your help and to others on this list who pointed the
way clear.
All the best and thanks for all the development too!
On 02/28/2018 03:20 PM, Jeff Squyres (jsquyres) wrote:
I took advantage of the fact that Bill's phone number is in his signature and
gave him a call (gasp! Talk to someone from the interwebs -- what craziness is
that?!).
The real issue here is that Open MPI's use of verbs in v1.10.2 pre-dates the
use of the rdma-core packaging. Various header file and other changes were
made in the transition from libiverbs to rdma-core, the Open MPI v1.10.2 simply
doesn't handle them correctly. Later versions in the v1.10.x series fix all
these issues such that both pre-rdma-core libibverbs and post-rdma-core
libibverbs are handled properly.
Compounding this was the fact that we had another bug in v1.10.2 that it wasn't
possible to fully disable the common/usnic component. Sad panda.
Unfortunately, the only way to move forward is to either apply a patch to
v1.10.2 (in this case, just to disable all the usNIC stuff, since this user is
not using usNIC at all), move forward to v1.10.7, or move forward to the latest
Open MPI (both v3.0.1 and v3.1.0 are literally immanently about to be released).
On Feb 28, 2018, at 2:41 PM, William T Jones <w.t.jo...@nasa.gov> wrote:
Thanks for the suggestions.
On 02/28/2018 12:10 PM, Jeff Squyres (jsquyres) wrote:
Oops; it looks like there's 2 chunks of usNIC code in the 1.10.x code base, and
--without-usnic only disables one of them.
I do believe we fixed that in a later 1.10.x release -- I am guessing you don't
want to upgrade to v3.0.x for compatibility/testing reasons, but do you think
you could move to a later v1.10.x release (which should be just bug fixes
compared to 1.10.2)?
If you can upgrade to v1.10.7, this should work:
./configure --without-usnic --without-verbs-usnic ...
This works. But will be tough because it will require lots of re-validation
work with the codes that depend on OpenMPI even though it is only a patch level
increment.
Sidenote: it's actually also possible that v1.10.7 will either correctly ignore
or correctly compile usNIC on your system (without any extra command line
options) -- we may have fixed that bug by v1.10.7; I honestly don't remember
offhand.
If you can't move to v1.10.7, I think you should be able to use the following
with v1.10.2 too disable the BTL usNIC component and the verbs_usnic common
component:
./configure --without-usnic --enable-mca-no-build=common-verbs_usnic ...
Sadly this does not work. Linker fails with undefined reference to
`ompi_common_verbs_usnic_register_fake_drivers'.
On Feb 28, 2018, at 11:55 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
Unfortunately, that does not work.
% ./configure --enable-static \
--with-tm=/usr/local/pkgs/PBSPro_64 \
--enable-mpi-thread-multiple \
--with-verbs=/usr \
--without-usnic \
--enable-mpi-cxx \
FC=ifort \
F77=ifort \
CC=icc \
CXX=icpc \
CFLAGS="-O3 -ip" \
FCFLAGS="-O3 -ip" \
LIBS="-lcrypto -lpthread"
...
Making all in mca/common/verbs_usnic
make[2]: Entering directory
`/misc/home2/wtjones1/GIT/fun3d/misc/module-builder/k/openmpi/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
CC libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field "ops"
.ops = {
^
common_verbs_usnic_fake.c(89): warning #266: function "ibv_read_sysfs_file"
declared implicitly
if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
^
common_verbs_usnic_fake.c(133): warning #266: function "ibv_register_driver"
declared implicitly
ibv_register_driver("usnic_verbs", fake_driver_init);
^
compilation aborted for common_verbs_usnic_fake.c (code 2)
On 02/28/2018 11:10 AM, r...@open-mpi.org wrote:
Not unless you have a USNIC card in your machine!
On Feb 28, 2018, at 8:08 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
Thank you!
Will that have any adverse side effects?
Performance penalties?
On 02/28/2018 10:57 AM, r...@open-mpi.org wrote:
Add --without-usnic
On Feb 28, 2018, at 7:50 AM, William T Jones <w.t.jo...@nasa.gov> wrote:
I realize that OpenMPI 1.10.2 is quite old, however, for compatibility I
am attempting to compile it after a system upgrade to CentOS 7.
This system does include infiniband and I have configured as follows
using Intel 2017.2.174 compilers:
% ./configure --enable-static \
--with-tm=/usr/local/pkgs/PBSPro_64 \
--enable-mpi-thread-multiple \
--with-verbs=/usr \
--enable-mpi-cxx \
FC=ifort \
F77=ifort \
CC=icc \
CXX=icpc \
CFLAGS="-O3 -ip" \
FCFLAGS="-O3 -ip" \
LIBS=-lcrypto -lpthread
However, when I compile I get the following error:
...
Making all in mca/common/verbs_usnic
make[2]: Entering directory
`/usr/src/openmpi-1.10.2/ompi/mca/common/verbs_usnic'
CC libmca_common_verbs_usnic_la-common_verbs_usnic_fake.lo
common_verbs_usnic_fake.c(72): error: struct "ibv_device" has no field
"ops"
.ops = {
^
common_verbs_usnic_fake.c(89): warning #266: function
"ibv_read_sysfs_file" declared implicitly
if (ibv_read_sysfs_file(uverbs_sys_path, "device/vendor",
^
common_verbs_usnic_fake.c(133): warning #266: function
"ibv_register_driver" declared implicitly
ibv_register_driver("usnic_verbs", fake_driver_init);
^
compilation aborted for common_verbs_usnic_fake.c (code 2)
Unfortunately, my /usr/include/infiniband/verbs.h file defines the
"ibv_device" structure but does not include "ops" member. Instead the
structure is defined as follows:
/* Obsolete, never used, do not touch */
struct _ibv_device_ops {
struct ibv_context * (*_dummy1)(struct ibv_device *device,
int cmd_fd);
void (*_dummy2)(struct ibv_context *context);
};
enum {
IBV_SYSFS_NAME_MAX = 64,
IBV_SYSFS_PATH_MAX = 256
};
struct ibv_device {
struct _ibv_device_ops _ops;
enum ibv_node_type node_type;
enum ibv_transport_type transport_type;
/* Name of underlying kernel IB device, eg "mthca0" */
char name[IBV_SYSFS_NAME_MAX];
/* Name of uverbs device, eg "uverbs0" */
char dev_name[IBV_SYSFS_NAME_MAX];
/* Path to infiniband_verbs class device in sysfs */
char dev_path[IBV_SYSFS_PATH_MAX];
/* Path to infiniband class device in sysfs */
char ibdev_path[IBV_SYSFS_PATH_MAX];
};
OpenMPI was previously compiled successfully under CentOS 6 and every
indication is that the /usr/include/infiniband/verbs.h was defined
similarly (again without the "ops" member).
Is it possible that there is a configure option that prevents this source from
being included in the build?
Any help is appreciated,
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bill Jones w.t.jo...@nasa.gov
Mail Stop 128 Computational AeroSciences Branch
15 Langley Boulevard Research Directorate
NASA Langley Research Center Building 1268, Room 1044
Hampton, VA 23681-2199 Phone +1 757 864-5318
Fax +1 757 864-8816
http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bill Jones w.t.jo...@nasa.gov
Mail Stop 128 Computational AeroSciences Branch
15 Langley Boulevard Research Directorate
NASA Langley Research Center Building 1268, Room 1044
Hampton, VA 23681-2199 Phone +1 757 864-5318
Fax +1 757 864-8816
http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bill Jones w.t.jo...@nasa.gov
Mail Stop 128 Computational AeroSciences Branch
15 Langley Boulevard Research Directorate
NASA Langley Research Center Building 1268, Room 1044
Hampton, VA 23681-2199 Phone +1 757 864-5318
Fax +1 757 864-8816
http://fun3d.larc.nasa.gov
<config.log.gz>_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bill Jones w.t.jo...@nasa.gov
Mail Stop 128 Computational AeroSciences Branch
15 Langley Boulevard Research Directorate
NASA Langley Research Center Building 1268, Room 1044
Hampton, VA 23681-2199 Phone +1 757 864-5318
Fax +1 757 864-8816
http://fun3d.larc.nasa.gov
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bill Jones w.t.jo...@nasa.gov
Mail Stop 128 Computational AeroSciences Branch
15 Langley Boulevard Research Directorate
NASA Langley Research Center Building 1268, Room 1044
Hampton, VA 23681-2199 Phone +1 757 864-5318
Fax +1 757 864-8816
http://fun3d.larc.nasa.gov
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users