I don't think this is right. We want the device initialized by PETSc , we 
just don't want the cublas and cusolve stuff initialized. In order to see how 
much memory initializing the blas and solvers takes.

  So I think you need to comment things in cupminterface.hpp like cublasCreate 
and cusolverDnCreate.

  Urgh, I hate C++ where huge chunks of real code are in header files.



> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <[email protected]> wrote:
> 
> Hit send too early…
> 
> If you don’t want to comment out, you can also run with "-device_enable lazy" 
> option. Normally this is the default behavior but if -log_view or 
> -log_summary is provided this defaults to “-device_enable eager”. See 
> src/sys/objects/device/interface/device.cxx:398
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> 
>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> You need to go into the PetscInitialize() routine find where it loads the 
>>> cublas and cusolve and comment out those lines then run with -log_view
>> 
>> Comment out
>> 
>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
>> PetscDefined(HAVE_SYCL))
>>   ierr = 
>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
>> #endif
>> 
>> At src/sys/objects/pinit.c:956
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>>> -log_view it loads all that stuff at startup. You need to go into the 
>>> PetscInitialize() routine find where it loads the cublas and cusolve and 
>>> comment out those lines then run with -log_view
>>> 
>>> 
>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>> 
>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way too 
>>>> much for doing nothing. A test script is attached to reproduce the issue. 
>>>> If I remove the first line "import torch", PETSc consumes about 0.73GB, 
>>>> which is still significant. Does anyone have any idea about this behavior?
>>>> 
>>>> Thanks,
>>>> Hong
>>>> 
>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>> (caidao22/update-examples)$ python3 test.py
>>>> CUDA memory before PETSc 0.000GB
>>>> CUDA memory after PETSc 0.004GB
>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>>> CUDA memory before PETSc 0.000GB
>>>> CUDA memory after PETSc 1.936GB
>>>> 
>>>> import torch
>>>> import sys
>>>> import os
>>>> 
>>>> import nvidia_smi
>>>> nvidia_smi.nvmlInit()
>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>>> 
>>>> petsc4py_path = 
>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>>> sys.path.append(petsc4py_path)
>>>> import petsc4py
>>>> petsc4py.init(sys.argv)
>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>>> 
>>> 
>> 
> 

Reply via email to