Hi,
I am currently configuring a GPU cluster. The cluster has 8 K20 GPUs per node on two sockets, 4 PCIe bus (2 K20 per bus, 4 K20 per socket), with a single QDR InfiniBand card on each node. We have the latest NVidia drivers and Cuda 6.0.

I am wondering if someone could tell me if all the default MCA parameters are optimal for cuda. More precisely, I am interrested about GDR and IPC. It seems from the parameters (see below) that they are both available (although GDR is disabled by default). However, my notes from GTC14 mention the btl_openib_have_driver_gdr parameter, which I do not see at all.

So, I guess, my questions :
1) Why is GDR disabled by default when available ?
2) Is the absence of btl_openib_have_driver_gdr an indicator of something missing ? 3) Are the default parameters, especially the rdma limits and such, optimal for our configuration ? 4) Do I want to enable or disable IPC by default (my notes state that bandwith is much better with MPS than IPC).

Thanks,

Here is what I get from
ompi_info --all | grep cuda

[mboisson@login-gpu01 ~]$ ompi_info --all | grep cuda
[login-gpu01.calculquebec.ca:11486] mca: base: components_register: registering filem components [login-gpu01.calculquebec.ca:11486] mca: base: components_register: found loaded component raw [login-gpu01.calculquebec.ca:11486] mca: base: components_register: component raw register function successful [login-gpu01.calculquebec.ca:11486] mca: base: components_register: registering snapc components
                  Prefix: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37
             Exec_prefix: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37
Bindir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/bin Sbindir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/sbin Libdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/lib Incdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/include Mandir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/man Pkglibdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/lib/openmpi Libexecdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/libexec Datarootdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share Datadir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share Sysconfdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/etc Sharedstatedir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/com Localstatedir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/var Infodir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/info Pkgdatadir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi Pkglibdir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/lib/openmpi Pkgincludedir: /software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/include/openmpi MCA mca: parameter "mca_param_files" (current value: "/home/mboisson/.openmpi/mca-params.conf:/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/etc/openmpi-mca-params.conf", data source: default, level: 2 user/detail, type: string, deprecated, synonym of: mca_base_param_files) MCA mca: parameter "mca_component_path" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/lib/openmpi:/home/mboisson/.openmpi/components", data source: default, level: 9 dev/all, type: string, deprecated, synonym of: mca_base_component_path) MCA mca: parameter "mca_base_param_files" (current value: "/home/mboisson/.openmpi/mca-params.conf:/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/etc/openmpi-mca-params.conf", data source: default, level: 2 user/detail, type: string, synonyms: mca_param_files) MCA mca: informational "mca_base_override_param_file" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/etc/openmpi-mca-params-override.conf", data source: default, level: 2 user/detail, type: string) MCA mca: parameter "mca_base_param_file_path" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi/amca-param-sets:/home/mboisson", data source: default, level: 3 user/all, type: string) MCA mca: parameter "mca_base_component_path" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/lib/openmpi:/home/mboisson/.openmpi/components", data source: default, level: 9 dev/all, type: string, synonyms: mca_component_path) MCA orte: parameter "orte_default_hostfile" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/etc/openmpi-default-hostfile", data source: default, level: 9 dev/all, type: string) MCA mpi: informational "mpi_built_with_cuda_support" (current value: "true", data source: default, level: 4 tuner/basic, type: bool) MCA mpi: parameter "mpi_cuda_support" (current value: "true", data source: default, level: 4 tuner/basic, type: bool) MCA btl: parameter "btl_self_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_self_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_smcuda_free_list_num" (current value: "8", data source: default, level: 5 tuner/detail, type: int) MCA btl: parameter "btl_smcuda_free_list_max" (current value: "-1", data source: default, level: 5 tuner/detail, type: int) MCA btl: parameter "btl_smcuda_free_list_inc" (current value: "64", data source: default, level: 5 tuner/detail, type: int) MCA btl: parameter "btl_smcuda_max_procs" (current value: "-1", data source: default, level: 5 tuner/detail, type: int) MCA btl: parameter "btl_smcuda_fifo_size" (current value: "4096", data source: default, level: 4 tuner/basic, type: unsigned) MCA btl: parameter "btl_smcuda_num_fifos" (current value: "1", data source: default, level: 4 tuner/basic, type: int) MCA btl: parameter "btl_smcuda_fifo_lazy_free" (current value: "120", data source: default, level: 5 tuner/detail, type: unsigned) MCA btl: parameter "btl_smcuda_sm_extra_procs" (current value: "0", data source: default, level: 9 dev/all, type: int) MCA btl: parameter "btl_smcuda_use_cuda_ipc" (current value: "1", data source: default, level: 4 tuner/basic, type: int) MCA btl: parameter "btl_smcuda_use_cuda_ipc_same_gpu" (current value: "1", data source: default, level: 4 tuner/basic, type: int) MCA btl: parameter "btl_smcuda_cuda_ipc_verbose" (current value: "0", data source: default, level: 4 tuner/basic, type: int) MCA btl: parameter "btl_smcuda_exclusivity" (current value: "65537", data source: default, level: 7 dev/basic, type: unsigned) MCA btl: parameter "btl_smcuda_flags" (current value: "1", data source: default, level: 5 tuner/detail, type: unsigned) MCA btl: parameter "btl_smcuda_rndv_eager_limit" (current value: "4096", data source: default, level: 4 tuner/basic, type: size_t) MCA btl: parameter "btl_smcuda_eager_limit" (current value: "4096", data source: default, level: 4 tuner/basic, type: size_t) MCA btl: parameter "btl_smcuda_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_smcuda_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_smcuda_max_send_size" (current value: "32768", data source: default, level: 4 tuner/basic, type: size_t) MCA btl: parameter "btl_sm_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_sm_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_tcp_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_tcp_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_openib_device_param_files" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi/mca-btl-openib-device-params.ini", data source: default, level: 9 dev/all, type: string, synonyms: btl_openib_hca_param_files) MCA btl: parameter "btl_openib_hca_param_files" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi/mca-btl-openib-device-params.ini", data source: default, level: 9 dev/all, type: string, deprecated, synonym of: btl_openib_device_param_files) MCA btl: parameter "btl_openib_cuda_async_send" (current value: "true", data source: default, level: 9 dev/all, type: bool) MCA btl: parameter "btl_openib_cuda_async_recv" (current value: "true", data source: default, level: 9 dev/all, type: bool) MCA btl: informational "btl_openib_have_cuda_gdr" (current value: "true", data source: default, level: 5 tuner/detail, type: bool) MCA btl: parameter "btl_openib_want_cuda_gdr" (current value: "false", data source: default, level: 9 dev/all, type: bool) MCA btl: parameter "btl_openib_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_openib_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_vader_cuda_eager_limit" (current value: "0", data source: default, level: 5 tuner/detail, type: size_t) MCA btl: parameter "btl_vader_cuda_rdma_limit" (current value: "18446744073709551615", data source: default, level: 5 tuner/detail, type: size_t) MCA coll: parameter "coll_ml_config_file" (current value: "/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37/share/openmpi/mca-coll-ml.config", data source: default, level: 9 dev/all, type: string) MCA io: informational "io_romio_complete_configure_params" (current value: "--with-file-system=nfs+lustre FROM_OMPI=yes CC='/software6/compilers/gcc/4.8/bin/gcc -std=gnu99' CFLAGS='-O3 -DNDEBUG -finline-functions -fno-strict-aliasing -pthread' CPPFLAGS=' -I/software-gpu/src/openmpi-1.8.1/opal/mca/hwloc/hwloc172/hwloc/include -I/software-gpu/src/openmpi-1.8.1/opal/mca/event/libevent2021/libevent -I/software-gpu/src/openmpi-1.8.1/opal/mca/event/libevent2021/libevent/include' FFLAGS='' LDFLAGS=' ' --enable-shared --enable-static --with-file-system=nfs+lustre --prefix=/software-gpu/mpi/openmpi/1.8.1_gcc4.8_cuda6.0.37 --disable-aio", data source: default, level: 9 dev/all, type: string)
[login-gpu01.calculquebec.ca:11486] mca: base: close: unloading component Q


--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

Reply via email to