> I don't think this is right. We want the device initialized by PETSc , we > just don't want the cublas and cusolve stuff initialized. In order to see how > much memory initializing the blas and solvers takes.
This is how it has always been, PetscDevice adopted the same initialization strategy of the previous code. Eager initialization initializes __everything__ since initializing cublas and cusolver takes (or at least historically took) eons. > So I think you need to comment things in cupminterface.hpp like > cublasCreate and cusolverDnCreate. > > Urgh, I hate C++ where huge chunks of real code are in header files. Not quite, you will need to comment out ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr); In src/sys/objects/device/impls/cupm/cupmcontext.hpp:202 Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) > On Jan 7, 2022, at 11:53, Barry Smith <[email protected]> wrote: > > > I don't think this is right. We want the device initialized by PETSc , we > just don't want the cublas and cusolve stuff initialized. In order to see how > much memory initializing the blas and solvers takes. > > So I think you need to comment things in cupminterface.hpp like > cublasCreate and cusolverDnCreate. > > Urgh, I hate C++ where huge chunks of real code are in header files. > > > >> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hit send too early… >> >> If you don’t want to comment out, you can also run with "-device_enable >> lazy" option. Normally this is the default behavior but if -log_view or >> -log_summary is provided this defaults to “-device_enable eager”. See >> src/sys/objects/device/interface/device.cxx:398 >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> >>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> You need to go into the PetscInitialize() routine find where it loads the >>>> cublas and cusolve and comment out those lines then run with -log_view >>> >>> Comment out >>> >>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || >>> PetscDefined(HAVE_SYCL)) >>> ierr = >>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr); >>> #endif >>> >>> At src/sys/objects/pinit.c:956 >>> >>> Best regards, >>> >>> Jacob Faibussowitsch >>> (Jacob Fai - booss - oh - vitch) >>> >>>> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> >>>> Without log_view it does not load any cuBLAS/cuSolve immediately with >>>> -log_view it loads all that stuff at startup. You need to go into the >>>> PetscInitialize() routine find where it loads the cublas and cusolve and >>>> comment out those lines then run with -log_view >>>> >>>> >>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev >>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>> >>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way >>>>> too much for doing nothing. A test script is attached to reproduce the >>>>> issue. If I remove the first line "import torch", PETSc consumes about >>>>> 0.73GB, which is still significant. Does anyone have any idea about this >>>>> behavior? >>>>> >>>>> Thanks, >>>>> Hong >>>>> >>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>> (caidao22/update-examples)$ python3 test.py >>>>> CUDA memory before PETSc 0.000GB >>>>> CUDA memory after PETSc 0.004GB >>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples >>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt >>>>> CUDA memory before PETSc 0.000GB >>>>> CUDA memory after PETSc 1.936GB >>>>> >>>>> import torch >>>>> import sys >>>>> import os >>>>> >>>>> import nvidia_smi >>>>> nvidia_smi.nvmlInit() >>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9)) >>>>> >>>>> petsc4py_path = >>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib') >>>>> sys.path.append(petsc4py_path) >>>>> import petsc4py >>>>> petsc4py.init(sys.argv) >>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0) >>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle) >>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9)) >>>>> >>>> >>> >> >
