I don't think this is right. We want the device initialized by PETSc , we just don't want the cublas and cusolve stuff initialized. In order to see how much memory initializing the blas and solvers takes.
So I think you need to comment things in cupminterface.hpp like cublasCreate and cusolverDnCreate. Urgh, I hate C++ where huge chunks of real code are in header files. > On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <[email protected]> wrote: > > Hit send too early… > > If you don’t want to comment out, you can also run with "-device_enable lazy" > option. Normally this is the default behavior but if -log_view or > -log_summary is provided this defaults to “-device_enable eager”. See > src/sys/objects/device/interface/device.cxx:398 > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > >> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected] >> <mailto:[email protected]>> wrote: >> >>> You need to go into the PetscInitialize() routine find where it loads the >>> cublas and cusolve and comment out those lines then run with -log_view >> >> Comment out >> >> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || >> PetscDefined(HAVE_SYCL)) >> ierr = >> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); >> #endif >> >> At src/sys/objects/pinit.c:956 >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >>> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>> Without log_view it does not load any cuBLAS/cuSolve immediately with >>> -log_view it loads all that stuff at startup. You need to go into the >>> PetscInitialize() routine find where it loads the cublas and cusolve and >>> comment out those lines then run with -log_view >>> >>> >>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev >>>> <[email protected] <mailto:[email protected]>> wrote: >>>> >>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way too >>>> much for doing nothing. A test script is attached to reproduce the issue. >>>> If I remove the first line "import torch", PETSc consumes about 0.73GB, >>>> which is still significant. Does anyone have any idea about this behavior? >>>> >>>> Thanks, >>>> Hong >>>> >>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>> (caidao22/update-examples)$ python3 test.py >>>> CUDA memory before PETSc 0.000GB >>>> CUDA memory after PETSc 0.004GB >>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt >>>> CUDA memory before PETSc 0.000GB >>>> CUDA memory after PETSc 1.936GB >>>> >>>> import torch >>>> import sys >>>> import os >>>> >>>> import nvidia_smi >>>> nvidia_smi.nvmlInit() >>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) >>>> >>>> petsc4py_path = >>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') >>>> sys.path.append(petsc4py_path) >>>> import petsc4py >>>> petsc4py.init(sys.argv) >>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9)) >>>> >>> >> >
