Dear petsc devs

We have noticed a performance regression using GAMG as the preconditioner to solve the velocity block in a Stokes equations saddle point system with variable viscosity solved on a 3D hexahedral mesh of a spherical shell using Q2-Q1 elements. This is comparing performance from the beginning of last year (petsc 3.16.4) and a more recent petsc master (from around May this year). This is the weak scaling analysis we published in https://doi.org/10.5194/gmd-15-5127-2022 Previously the number of iterations for the velocity block (inner solve of the Schur complement) starts at 40 iterations (https://gmd.copernicus.org/articles/15/5127/2022/gmd-15-5127-2022-f10-web.png) and only slowly going for larger problems (+more cores). Now the number of iterations now starts at 60 (https://github.com/stephankramer/petsc-scaling/blob/main/after/SPD_Combined_Iterations.png), same tolerances, again slowly going up with increasing size, with the cost per iteration also gone up (slightly) - resulting in an increased runtime of > 50%.

The main change we can see is that the coarsening seems to have gotten a lot less aggressive at the first coarsening stage (finest->to one-but-finest) - presumably after the MIS(A^T A) -> MIS(MIS(A)) change? The performance issues might be similar to https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2023-April/048366.html ?

As an example at "Level 7" (6,389,890 vertices, run on 1536 cpus) on the older petsc version we had:

          rows=126, cols=126, bs=6
          total: nonzeros=15876, allocated nonzeros=15876
--
          rows=3072, cols=3072, bs=6
          total: nonzeros=3344688, allocated nonzeros=3344688
--
          rows=91152, cols=91152, bs=6
          total: nonzeros=109729584, allocated nonzeros=109729584
--
          rows=2655378, cols=2655378, bs=6
          total: nonzeros=1468980252, allocated nonzeros=1468980252
--
          rows=152175366, cols=152175366, bs=3
          total: nonzeros=29047661586, allocated nonzeros=29047661586

Whereas with the newer version we get:

          rows=420, cols=420, bs=6
          total: nonzeros=176400, allocated nonzeros=176400
--
          rows=6462, cols=6462, bs=6
          total: nonzeros=10891908, allocated nonzeros=10891908
--
          rows=91716, cols=91716, bs=6
          total: nonzeros=81687384, allocated nonzeros=81687384
--
          rows=5419362, cols=5419362, bs=6
          total: nonzeros=3668190588, allocated nonzeros=3668190588
--
          rows=152175366, cols=152175366, bs=3
          total: nonzeros=29047661586, allocated nonzeros=29047661586

So in the first step it coarsens from 150e6 to 5.4e6 DOFs instead of to 2.6e6 DOFs. Note that we are providing the rigid body near nullspace, hence the bs=3 to bs=6. We have tried different values for the gamg_threshold but it doesn't really seem to significantly alter the coarsening amount in that first step.

Do you have any suggestions for further things we should try/look at? Any feedback would be much appreciated

Best wishes
Stephan Kramer

Full logs including log_view timings available from https://github.com/stephankramer/petsc-scaling/

In particular:

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat
https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat

Reply via email to