Brice,

unless you want to enable/disable nvml at runtime, and assuming we do not need nvml in Open MPI,

and IMHO, the easiest workaround is to update

https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc1113/configure.m4

and add the oneliner

enable_nvml=no


a better option could be to update https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/configure.m4

and pass the --enable-nvml option from Open MPI down to hwloc.


Cheers,


Gilles




On 10/24/2016 4:45 PM, Brice Goglin wrote:
FWIW, I am still open to implementing something to workaround this in hwloc.
Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major
configured dependencies.

Brice



Le 24/10/2016 02:12, Gilles Gouaillardet a écrit :
Justin,


iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no
real benefit for having that.

as a workaround, you can

export enable_nvml=no

and then configure && make install

Cheers,

Gilles

On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote:
Justin --

Fair point.  Can you work with Sylvain Jeaugey (at Nvidia) to submit
a pull request for this functionality?

Thanks.


On Oct 18, 2016, at 2:26 PM, Justin Luitjens <jluitj...@nvidia.com>
wrote:

After looking into this a bit more it appears that the issue is I am
building on a head node which does not have the driver installed.
Building on back node resolves this issue.  In CUDA 8.0 the NVML
stubs can be found in the toolkit at the following path:
${CUDA_HOME}/lib64/stubs
   For 8.0 I’d suggest updating the configure/make scripts to look
for nvml there and link in the stubs.  This way the build is not
dependent on the driver being installed and only the toolkit.
   Thanks,
Justin
   From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
Justin Luitjens
Sent: Tuesday, October 18, 2016 9:53 AM
To: users@lists.open-mpi.org
Subject: [OMPI users] Problem building OpenMPI with CUDA 8.0
   I have the release version of CUDA 8.0 installed and am trying to
build OpenMPI.
   Here is my configure and build line:
   ./configure --prefix=$PREFIXPATH --with-cuda=$CUDA_HOME --with-tm=
--with-openib= && make && sudo make install
   Where CUDA_HOME points to the cuda install path.
   When I run the above command it builds for quite a while but
eventually errors out wit this:
   make[2]: Entering directory
`/home/jluitjens/Perforce/jluitjens_dtlogin_p4sw/sw/devrel/DevtechCompute/Internal/Tools/dtlogin/scripts/mpi/openmpi-1.10.1-gcc5.0_2014_11-cuda8.0/opal/tools/wrappers'
    CCLD     opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlInit_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetHandleByIndex_v2'
../../../opal/.libs/libopen-pal.so: undefined reference to
`nvmlDeviceGetCount_v2'
     Any idea what I might need to change to get around this error?
   Thanks,
Justin
This email message is for the sole use of the intended recipient(s)
and may contain confidential information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to