On Fri, Feb 7, 2020 at 1:23 PM Zhang, Hong via petsc-dev < petsc-dev@mcs.anl.gov> wrote:
> Hi all, > > Previously I have noticed that the first call to a CUDA function such as > cudaMalloc and cudaFree in PETSc takes a long time (7.5 seconds) on summit. > Then I prepared a simple example as attached to help OCLF reproduce the > problem. It turned out that the problem was caused by PETSc. The > 7.5-second overhead can be observed only when the PETSc lib is linked. If I > do not link PETSc, it runs normally. Does anyone have any idea why this > happens and how to fix it? > Hong, this sounds like a screwed up dynamic linker. Can you try this with a statically linked executable? Thanks, Matt > Hong (Mr.) > > bash-4.2$ cat ex_simple.c > #include <time.h> > #include <cuda_runtime.h> > #include <stdio.h> > > int main(int argc,char **args) > { > clock_t start,s1,s2,s3; > double cputime; > double *init,tmp[100] = {0}; > > start = clock(); > cudaFree(0); > s1 = clock(); > cudaMalloc((void **)&init,100*sizeof(double)); > s2 = clock(); > cudaMemcpy(init,tmp,100*sizeof(double),cudaMemcpyHostToDevice); > s3 = clock(); > printf("free time =%lf malloc time =%lf copy time =%lf\n",((double) (s1 > - start)) / CLOCKS_PER_SEC,((double) (s2 - s1)) / CLOCKS_PER_SEC,((double) > (s3 - s2)) / CLOCKS_PER_SEC); > > return 0; > } > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>