[petsc-users] KSPSolve + MUMPS memory growth issues

Corbijn van Willenswaard, Lars (UT) via petsc-users Thu, 05 Sep 2024 10:40:59 -0700

Dear PETSc,

For the last months I’ve struggled with a solver that I wrote for a FEM 
eigenvalue problem running out of memory. I’ve traced it to KSPSolve + MUMPS 
being the issue, but I'm getting stuck on digging deeper.


The reason I suspect the KSPSolve/MUMPS is that when commenting out the 
KSPSolve the memory stays constant while running the rest of the algorithm. Of 
course, the algorithm also converges to a different result in this setup. When 
changing the KSP statement to 
for(int i = 0; i < 100000000; i++) KSPSolve(A_, vec1_, vec2_);
the memory grows faster than when running the algorithm. Logging shows that the 
program never the terminating i=100M. Measuring the memory growth using ps 
(started debugging before I knew of PETSc's features) I see a growth in the RSS 
on a single compute node of up to 300MB/min for this artificial case. Real 
cases grow more like 60MB/min/node, which causes a kill due to memory 
exhaustion after about 2-3 days.

Locally (Mac) I've been able to reproduce this both with 6 MPI processes and 
with a single one. Instrumenting the code to show differences in 
PetscMemoryGetCurrentUsage (full code below), shows that the memory increases 
every step at the start, but also does at later iterations (small excerpt from 
the output):
rank step        memory (increase since prev step)
 0   6544 current 39469056(  8192)
 0   7086 current 39477248(  8192)
 0   7735 current 39497728( 20480)
 0   9029 current 39501824(  4096)
A similar output is visible in a run with 6 ranks, where there does not seem to 
be a pattern as to which of the ranks increases at which step. (Note I've 
checked PetscMallocGetCurrentUsage, but that is constant)

Switching the solver to petsc's own solver on a single rank does not show a 
memory increase after the first solve. Changing the solve to overwrite the 
vector will result in a few increases after the first solve, but these do not 
seem to repeat. So, changes like VecCopy(vec2, vec1_); KSPSolve(A_, vec1_, 
vec1_);.

Does anyone have an idea on how to further dig into this problem?

Kind regards,
Lars Corbijn


Instrumentation:

PetscLogDouble lastCurrent, current;
int rank;
MPI_Comm_rank(PETSC_COMM_WORLD, &rank);
for(int i = 0; i < 100000000; ++i) {
    PetscMemoryGetCurrentUsage(&lastCurrent);
    KSPSolve(A_, vec1_, vec2_);
    PetscMemoryGetCurrentUsage(&current);
    if(current != lastCurrent) {
        std::cout << std::setw(2) << rank << " " << std::setw(6) << i
                  << " current " << std::setw(8) << (int) current << std::right
                  << "(" << std::setw(6) << (int)(current - lastCurrent) << ")"
                  << std::endl;
    }
    lastCurrent = current;
}


Matrix details
The matrix A in question is created from a complex valued matrix C_ (type 
mataij) using the following code (modulo renames). Theoretically it should be a 
Laplacian with phase-shift periodic boundary conditions
MatHermitianTranspose(C_, MAT_INITIAL_MATRIX, &Y_);
MatProductCreate(C_, Y_, NULL, & A_);
MatProductSetType(A_, MATPRODUCT_AB);
MatProductSetFromOptions(A_);
MatProductSymbolic(A_);
MatProductNumeric(A_);

Petsc arguments: -log_view_memory -log_view :petsc.log -ksp_type preonly 
-pc_type lu -pc_factor_mat_solver_type mumps -bv_matmult vecs -memory_view

[petsc-users] KSPSolve + MUMPS memory growth issues

Reply via email to