-cuda_initialize 0 does not make any difference. Actually this issue has 
nothing to do with PetscInitialize(). I tried to call cudaFree(0) before 
PetscInitialize(), and it still took 7.5 seconds.

Hong

On Feb 10, 2020, at 10:44 AM, Zhang, Junchao 
<jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>> wrote:

As I mentioned, have you tried -cuda_initialize 0? Also, PetscCUDAInitialize 
contains
ierr = PetscCUBLASInitializeHandle();CHKERRQ(ierr);
ierr = PetscCUSOLVERDnInitializeHandle();CHKERRQ(ierr);
Have you tried to comment out them and test again?
--Junchao Zhang


On Sat, Feb 8, 2020 at 5:22 PM Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:


On Feb 8, 2020, at 5:03 PM, Matthew Knepley 
<knep...@gmail.com<mailto:knep...@gmail.com>> wrote:

On Sat, Feb 8, 2020 at 4:34 PM Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:
I did some further investigation. The overhead persists for both the PETSc 
shared library and the static library. In the previous example, it does not 
call any PETSc function, the first CUDA function becomes very slow when it is 
linked to the petsc so. This indicates that the slowdown occurs if the symbol 
(cudafree)is searched through the petsc so, but does not occur if the symbol is 
found directly in the cuda runtime lib.

So the issue has nothing to do with the dynamic linker. The following example 
can be used to easily reproduce the problem (cudaFree(0) always takes ~7.5 
seconds).

1) This should go to OLCF admin as Jeff suggests

I had sent this to OLCF admin before the discussion was started here. Thomas 
Papatheodore has followed up. I am trying to help him reproduce the problem on 
summit.


2) Just to make sure I understand, a static executable with this code is still 
slow on the cudaFree(), since CUDA is a shared library by default.

I prepared the code as a minimal example to reproduce the problem. It would be 
fair to say any code using PETSc (with CUDA enabled, built statically or 
dynamically) on summit suffers a 7.5-second overhead on the first CUDA function 
call (either in the user code or inside PETSc).

Thanks,
Hong


I think we should try:

  a) Forcing a full static link, if possible

  b) Asking OLCF about link resolution order

It sounds like a similar thing I have seen in the past where link resolution 
order can exponentially increase load time.

  Thanks,

     Matt

bash-4.2$ cat ex_simple_petsc.c
#include <time.h>
#include <cuda_runtime.h>
#include <stdio.h>
#include <petscmat.h>

int main(int argc,char **args)
{
  clock_t start,s1,s2,s3;
  double  cputime;
  double  *init,tmp[100] = {0};
  PetscErrorCode ierr=0;

  ierr = PetscInitialize(&argc,&args,(char*)0,NULL);if (ierr) return ierr;
  start = clock();
  cudaFree(0);
  s1 = clock();
  cudaMalloc((void **)&init,100*sizeof(double));
  s2 = clock();
  cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
  s3 = clock();
  printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - 
start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - 
s2)) / CLOCKS_PER_SEC);
  ierr = PetscFinalize();
  return ierr;
}

Hong

On Feb 7, 2020, at 3:09 PM, Zhang, Hong 
<hongzh...@anl.gov<mailto:hongzh...@anl.gov>> wrote:

Note that the overhead was triggered by the first call to a CUDA function. So 
it seems that the first CUDA function triggered loading petsc so (if petsc so 
is linked), which is slow on the summit file system.

Hong

On Feb 7, 2020, at 2:54 PM, Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

Linking any other shared library does not slow down the execution. The PETSc 
shared library is the only one causing trouble.

Here are the ldd output for two different versions. For the first version, I 
removed -lpetsc and it ran very fast. The second (slow) version was linked to 
petsc so.

bash-4.2$ ldd ex_simple
        linux-vdso64.so.1 =>  (0x0000200000050000)
        liblapack.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
 (0x0000200000070000)
        libblas.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
 (0x00002000009b0000)
        libhdf5hl_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
 (0x0000200000e80000)
        libhdf5_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
 (0x0000200000ed0000)
        libhdf5_hl.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
 (0x0000200000f50000)
        libhdf5.so.103 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
 (0x0000200000fb0000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x00002000015e0000)
        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
(0x0000200001770000)
        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
(0x0000200009b00000)
        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 
(0x000020000d950000)
        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 
(0x000020000d9f0000)
        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 
(0x0000200012f50000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x000020001dc40000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x000020001ddd0000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x000020001de00000)
        libmpiprofilesupport.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
 (0x000020001de40000)
        libmpi_ibm_usempi.so => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
 (0x000020001de70000)
        libmpi_ibm_mpifh.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
 (0x000020001dea0000)
        libmpi_ibm.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
 (0x000020001df40000)
        libpgf90rtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
 (0x000020001e0b0000)
        libpgf90.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
 (0x000020001e0f0000)
        libpgf90_rpm1.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
 (0x000020001e6a0000)
        libpgf902.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
 (0x000020001e6d0000)
        libpgftnrtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
 (0x000020001e700000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x000020001e730000)
        libpgkomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
 (0x000020001e760000)
        libomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
 (0x000020001e790000)
        libomptarget.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
 (0x000020001e880000)
        libpgmath.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
 (0x000020001e8b0000)
        libpgc.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
 (0x000020001e9d0000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x000020001eb40000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x000020001eb70000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x000020001ec60000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x000020001eca0000)
        libz.so.1 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
 (0x000020001ee90000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x000020001eef0000)
        /lib64/ld64.so.2 (0x0000200000000000)
        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 
(0x000020001ef40000)
        libutil.so.1 => /usr/lib64/libutil.so.1 (0x0000200020e50000)
        libhwloc_ompi.so.15 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
 (0x0000200020e80000)
        libevent-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
 (0x0000200020ef0000)
        libevent_pthreads-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
 (0x0000200020f70000)
        libopen-rte.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
 (0x0000200020fa0000)
        libopen-pal.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
 (0x00002000210b0000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x00002000211a0000)


bash-4.2$ ldd ex_simple_slow
        linux-vdso64.so.1 =>  (0x0000200000050000)
        libpetsc.so.3.012 => 
/autofs/nccs-svm1_home1/hongzh/Projects/petsc/arch-olcf-summit-sell-opt/lib/libpetsc.so.3.012
 (0x0000200000070000)
        liblapack.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/liblapack.so.0
 (0x0000200002be0000)
        libblas.so.0 => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libblas.so.0
 (0x0000200003520000)
        libhdf5hl_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5hl_fortran.so.100
 (0x00002000039f0000)
        libhdf5_fortran.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_fortran.so.100
 (0x0000200003a40000)
        libhdf5_hl.so.100 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5_hl.so.100
 (0x0000200003ac0000)
        libhdf5.so.103 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/hdf5-1.10.3-pgiul2yf4auv7krecd72t6vupd7e3qgn/lib/libhdf5.so.103
 (0x0000200003b20000)
        libX11.so.6 => /usr/lib64/libX11.so.6 (0x0000200004150000)
        libcufft.so.10 => /sw/summit/cuda/10.1.168/lib64/libcufft.so.10 
(0x00002000042e0000)
        libcublas.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublas.so.10 
(0x000020000c670000)
        libcudart.so.10.1 => /sw/summit/cuda/10.1.168/lib64/libcudart.so.10.1 
(0x00002000104c0000)
        libcusparse.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusparse.so.10 
(0x0000200010560000)
        libcusolver.so.10 => /sw/summit/cuda/10.1.168/lib64/libcusolver.so.10 
(0x0000200015ac0000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002000207b0000)
        libdl.so.2 => /usr/lib64/libdl.so.2 (0x0000200020940000)
        libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x0000200020970000)
        libmpiprofilesupport.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpiprofilesupport.so.3
 (0x00002000209b0000)
        libmpi_ibm_usempi.so => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_usempi.so
 (0x00002000209e0000)
        libmpi_ibm_mpifh.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm_mpifh.so.3
 (0x0000200020a10000)
        libmpi_ibm.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libmpi_ibm.so.3
 (0x0000200020ab0000)
        libpgf90rtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90rtl.so
 (0x0000200020c20000)
        libpgf90.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90.so
 (0x0000200020c60000)
        libpgf90_rpm1.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf90_rpm1.so
 (0x0000200021210000)
        libpgf902.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgf902.so
 (0x0000200021240000)
        libpgftnrtl.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgftnrtl.so
 (0x0000200021270000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00002000212a0000)
        libpgkomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgkomp.so
 (0x00002000212d0000)
        libomp.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomp.so
 (0x0000200021300000)
        libomptarget.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libomptarget.so
 (0x00002000213f0000)
        libpgmath.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgmath.so
 (0x0000200021420000)
        libpgc.so => 
/autofs/nccs-svm1_sw/summit/.swci/0-core/opt/spack/20180914/linux-rhel7-ppc64le/gcc-4.8.5/pgi-19.4-6acz4xyqjlpoaonjiiqjme2aknrfnzoy/linuxpower/19.4/lib/libpgc.so
 (0x0000200021540000)
        librt.so.1 => /usr/lib64/librt.so.1 (0x00002000216b0000)
        libm.so.6 => /usr/lib64/libm.so.6 (0x00002000216e0000)
        libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00002000217d0000)
        libc.so.6 => /usr/lib64/libc.so.6 (0x0000200021810000)
        libz.so.1 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/zlib-1.2.11-2htm7ws4hgrthi5tyjnqxtjxgpfklxsc/lib/libz.so.1
 (0x0000200021a10000)
        libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x0000200021a60000)
        /lib64/ld64.so.2 (0x0000200000000000)
        libcublasLt.so.10 => /sw/summit/cuda/10.1.168/lib64/libcublasLt.so.10 
(0x0000200021ab0000)
        libutil.so.1 => /usr/lib64/libutil.so.1 (0x00002000239c0000)
        libhwloc_ompi.so.15 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libhwloc_ompi.so.15
 (0x00002000239f0000)
        libevent-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent-2.1.so.6
 (0x0000200023a60000)
        libevent_pthreads-2.1.so.6 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libevent_pthreads-2.1.so.6
 (0x0000200023ae0000)
        libopen-rte.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-rte.so.3
 (0x0000200023b10000)
        libopen-pal.so.3 => 
/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/pgi-19.4/spectrum-mpi-10.3.0.1-20190611-4ymaahbai7ehhw4rves5jjiwon2laz3a/lib/libopen-pal.so.3
 (0x0000200023c20000)
        libXau.so.6 => /usr/lib64/libXau.so.6 (0x0000200023d10000)


On Feb 7, 2020, at 2:31 PM, Smith, Barry F. 
<bsm...@mcs.anl.gov<mailto:bsm...@mcs.anl.gov>> wrote:


 ldd -o on the executable of both linkings of your code.

 My guess is that without PETSc it is linking the static version of the needed 
libraries and with PETSc the shared. And, in typical fashion, the shared 
libraries are off on some super slow file system so take a long time to be 
loaded and linked in on demand.

  Still a performance bug in Summit.

  Barry


On Feb 7, 2020, at 12:23 PM, Zhang, Hong via petsc-dev 
<petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote:

Hi all,

Previously I have noticed that the first call to a CUDA function such as 
cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. 
Then I prepared a simple example as attached to help OCLF reproduce the 
problem. It turned out that the problem was  caused by PETSc. The 7.5-second 
overhead can be observed only when the PETSc lib is linked. If I do not link 
PETSc, it runs normally. Does anyone have any idea why this happens and how to 
fix it?

Hong (Mr.)

bash-4.2$ cat ex_simple.c
#include <time.h>
#include <cuda_runtime.h>
#include <stdio.h>

int main(int argc,char **args)
{
clock_t start,s1,s2,s3;
double  cputime;
double   *init,tmp[100] = {0};

start = clock();
cudaFree(0);
s1 = clock();
cudaMalloc((void **)&init,100*sizeof(double));
s2 = clock();
cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice);
s3 = clock();
printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 - 
start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) (s3 - 
s2)) / CLOCKS_PER_SEC);

return 0;
}








--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


Reply via email to