> I don't think this is right. We want the device initialized by PETSc , we 
> just don't want the cublas and cusolve stuff initialized. In order to see how 
> much memory initializing the blas and solvers takes.

This is how it has always been, PetscDevice adopted the same initialization 
strategy of the previous code. Eager initialization initializes __everything__ 
since initializing cublas and cusolver takes (or at least historically took) 
eons.

>   So I think you need to comment things in cupminterface.hpp like 
> cublasCreate and cusolverDnCreate.
> 
>   Urgh, I hate C++ where huge chunks of real code are in header files.

Not quite, you will need to comment out

ierr = __initialize(dctx->device->deviceId,dci);CHKERRQ(ierr);

In src/sys/objects/device/impls/cupm/cupmcontext.hpp:202

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Jan 7, 2022, at 11:53, Barry Smith <[email protected]> wrote:
> 
> 
>   I don't think this is right. We want the device initialized by PETSc , we 
> just don't want the cublas and cusolve stuff initialized. In order to see how 
> much memory initializing the blas and solvers takes.
> 
>   So I think you need to comment things in cupminterface.hpp like 
> cublasCreate and cusolverDnCreate.
> 
>   Urgh, I hate C++ where huge chunks of real code are in header files.
> 
> 
> 
>> On Jan 7, 2022, at 11:34 AM, Jacob Faibussowitsch <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hit send too early…
>> 
>> If you don’t want to comment out, you can also run with "-device_enable 
>> lazy" option. Normally this is the default behavior but if -log_view or 
>> -log_summary is provided this defaults to “-device_enable eager”. See 
>> src/sys/objects/device/interface/device.cxx:398
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> 
>>> On Jan 7, 2022, at 11:29, Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> You need to go into the PetscInitialize() routine find where it loads the 
>>>> cublas and cusolve and comment out those lines then run with -log_view
>>> 
>>> Comment out
>>> 
>>> #if (PetscDefined(HAVE_CUDA) || PetscDefined(HAVE_HIP) || 
>>> PetscDefined(HAVE_SYCL))
>>>   ierr = 
>>> PetscDeviceInitializeFromOptions_Internal(PETSC_COMM_WORLD);CHKERRQ(ierr);
>>> #endif
>>> 
>>> At src/sys/objects/pinit.c:956
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> 
>>>> On Jan 7, 2022, at 11:24, Barry Smith <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> Without log_view it does not load any cuBLAS/cuSolve immediately with 
>>>> -log_view it loads all that stuff at startup. You need to go into the 
>>>> PetscInitialize() routine find where it loads the cublas and cusolve and 
>>>> comment out those lines then run with -log_view
>>>> 
>>>> 
>>>>> On Jan 7, 2022, at 11:14 AM, Zhang, Hong via petsc-dev 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> 
>>>>> When PETSc is initialized, it takes about 2GB CUDA memory. This is way 
>>>>> too much for doing nothing. A test script is attached to reproduce the 
>>>>> issue. If I remove the first line "import torch", PETSc consumes about 
>>>>> 0.73GB, which is still significant. Does anyone have any idea about this 
>>>>> behavior?
>>>>> 
>>>>> Thanks,
>>>>> Hong
>>>>> 
>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>> (caidao22/update-examples)$ python3 test.py
>>>>> CUDA memory before PETSc 0.000GB
>>>>> CUDA memory after PETSc 0.004GB
>>>>> hongzhang@gpu02:/gpfs/jlse-fs0/users/hongzhang/Projects/pnode/examples 
>>>>> (caidao22/update-examples)$ python3 test.py -log_view :0.txt
>>>>> CUDA memory before PETSc 0.000GB
>>>>> CUDA memory after PETSc 1.936GB
>>>>> 
>>>>> import torch
>>>>> import sys
>>>>> import os
>>>>> 
>>>>> import nvidia_smi
>>>>> nvidia_smi.nvmlInit()
>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>> print('CUDA memory before PETSc %.3fGB' % (info.used/1e9))
>>>>> 
>>>>> petsc4py_path = 
>>>>> os.path.join(os.environ['PETSC_DIR'],os.environ['PETSC_ARCH'],'lib')
>>>>> sys.path.append(petsc4py_path)
>>>>> import petsc4py
>>>>> petsc4py.init(sys.argv)
>>>>> handle = nvidia_smi.nvmlDeviceGetHandleByIndex(0)
>>>>> info = nvidia_smi.nvmlDeviceGetMemoryInfo(handle)
>>>>> print('CUDA memory after PETSc %.3fGB' % (info.used/1e9))
>>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to