[petsc-users] performance regression with GAMG

Stephan Kramer Wed, 09 Aug 2023 10:08:51 -0700

Dear petsc devs

We have noticed a performance regression using GAMG as thepreconditioner to solve the velocity block in a Stokes equations saddlepoint system with variable viscosity solved on a 3D hexahedral mesh of aspherical shell using Q2-Q1 elements. This is comparing performance fromthe beginning of last year (petsc 3.16.4) and a more recent petsc master(from around May this year). This is the weak scaling analysis wepublished in https://doi.org/10.5194/gmd-15-5127-2022 Previously thenumber of iterations for the velocity block (inner solve of the Schurcomplement) starts at 40 iterations(https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png)and only slowly going for larger problems (+more cores). Now the numberof iterations now starts at 60(https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png),same tolerances, again slowly going up with increasing size, with thecost per iteration also gone up (slightly) - resulting in an increasedruntime of > 50%.

The main change we can see is that the coarsening seems to have gotten alot less aggressive at the first coarsening stage (finest->toone-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change?The performance issues might be similar tohttps://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html ?

As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on theolder petsc version we had:


          rows=126, cols=126, bs=6
          total: nonzeros=15876, allocated nonzeros=15876
--
          rows=3072, cols=3072, bs=6
          total: nonzeros=3344688, allocated nonzeros=3344688
--
          rows=91152, cols=91152, bs=6
          total: nonzeros=109729584, allocated nonzeros=109729584
--
          rows=2655378, cols=2655378, bs=6
          total: nonzeros=1468980252, allocated nonzeros=1468980252
--
          rows=152175366, cols=152175366, bs=3
          total: nonzeros=29047661586, allocated nonzeros=29047661586

Whereas with the newer version we get:

          rows=420, cols=420, bs=6
          total: nonzeros=176400, allocated nonzeros=176400
--
          rows=6462, cols=6462, bs=6
          total: nonzeros=10891908, allocated nonzeros=10891908
--
          rows=91716, cols=91716, bs=6
          total: nonzeros=81687384, allocated nonzeros=81687384
--
          rows=5419362, cols=5419362, bs=6
          total: nonzeros=3668190588, allocated nonzeros=3668190588
--
          rows=152175366, cols=152175366, bs=3
          total: nonzeros=29047661586, allocated nonzeros=29047661586

So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to2.6e6 DOFs. Note that we are providing the rigid body near nullspace,hence the bs=3 to bs=6.We have tried different values for the gamg_threshold but it doesn'treally seem to significantly alter the coarsening amount in that first step.

Do you have any suggestions for further things we should try/look at?Any feedback would be much appreciated


Best wishes
Stephan Kramer

Full logs including log_view timings available fromhttps://github.com/stephankramer/petsc-scaling/


In particular:

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat

[petsc-users] performance regression with GAMG

Reply via email to