Hi, I have successfully built openmpi-3.0.0 from source with cuda 8.0.61.2 and 7.5.18 on CentOS-7 x86_64 (default system gnu compilers). I am trying to build openmpi-3.0.0 with cuda9 on CentOS-7 and failed with cuda9 with this error:
make[2]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/mca/shmem/sysv' Making all in tools/wrappers make[2]: Entering directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/tools/wrappers' CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `nvmlDeviceGetPciInfo_v3' collect2: error: ld returned 1 exit status make[2]: *** [opal_wrapper] Error 1 make[2]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal/tools/wrappers' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/c7/home/tru/build/openmpi-3.0.0/build-cuda-9.0.176_384.81/opal' make: *** [all-recursive] Error 1 <Additionnal informations (failing builder)> [tru@manolito build-cuda-9.0.176_384.81]$ grep -r nvmlDeviceGetPciInfo_v3 $CUDA_INSTALL_PATH Binary file /c7/shared/cuda/9.0.176_384.81/lib64/stubs/libnvidia-ml.so matches /c7/shared/cuda/9.0.176_384.81/include/nvml.h:#define nvmlDeviceGetPciInfo nvmlDeviceGetPciInfo_v3 The desktop has a legacy card and the supporting driver does not support the cuda9, but I would not expect that would cause such an error, but maybe? [tru@manolito build-cuda-9.0.176_384.81]$ nvidia-smi Wed Oct 11 08:42:33 2017 +------------------------------------------------------+ | NVIDIA-SMI 340.102 Driver Version: 340.102 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce 8600 GT Off | 0000:01:00.0 N/A | N/A | | 0% 72C P0 N/A / N/A | 3MiB / 511MiB | N/A Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 Not Supported | +-----------------------------------------------------------------------------+ [tru@manolito build-cuda-9.0.176_384.81]$ deviceQuery deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 35 -> CUDA driver version is insufficient for CUDA runtime version Result = FAIL [tru@manolito build-cuda-9.0.176_384.81]$ deviceQueryDrv deviceQueryDrv Starting... CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s) Device 0: "GeForce 8600 GT" CUDA Driver Version: 6.5 CUDA Capability Major/Minor version number: 1.1 Total amount of global memory: 511 MBytes (536150016 bytes) MapSMtoCores for SM 1.1 is undefined. Default to use 64 Cores/SM MapSMtoCores for SM 1.1 is undefined. Default to use 64 Cores/SM ( 4) Multiprocessors, ( 64) CUDA Cores/MP: 256 CUDA Cores GPU Max Clock rate: 1188 MHz (1.19 GHz) Memory Clock rate: 700 Mhz Memory Bus Width: 128-bit Max Texture Dimension Sizes 1D=(8192) 2D=(65536, 32768) 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per multiprocessor: 768 Maximum number of threads per block: 512 Max dimension size of a thread block (x,y,z): (512, 512, 64) Max dimension size of a grid size (x,y,z): (65535, 65535, 1) Texture alignment: 256 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: No Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): No cuDeviceGetAttribute returned 1 -> CUDA_ERROR_INVALID_VALUE The nvidia driver (340.102) only support version 6.5, but no issue building for cuda 7.5 and 8. </Additionnal informations (failing builder)> If I switch to a newer machine (same OS, just different card and Nvidia driver), the build does through and check pass! Bottom line, for cuda9(only?) one might need to build on the target machine, not on a legacy one, of course ymmv. Cheers Tru <Additionnal info (successfull builder)> [tru@borma build-cuda-9.0.176_384.81]$ deviceQueryDrv deviceQueryDrv Starting... CUDA Device Query (Driver API) statically linked version Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1080 Ti" CUDA Driver Version: 9.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 11172 MBytes (11714691072 bytes) (28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores GPU Max Clock rate: 1582 MHz (1.58 GHz) Memory Clock rate: 5505 Mhz Memory Bus Width: 352-bit L2 Cache Size: 2883584 bytes Max Texture Dimension Sizes 1D=(131072) 2D=(131072, 65536) 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Texture alignment: 512 bytes Maximum memory pitch: 2147483647 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Concurrent kernel execution: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > Result = PASS </Additionnal info (successfull builder)> -- Dr Tru Huynh | mailto:t...@pasteur.fr | tel/fax +33 1 45 68 87 37/19 https://research.pasteur.fr/en/team/structural-bioinformatics/ Institut Pasteur, 25-28 rue du Docteur Roux, 75724 Paris CEDEX 15 France _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users