Re: [petsc-users] performance regression with GAMG

Stephan Kramer Fri, 01 Sep 2023 04:07:47 -0700

Hi Mark

Sorry took a while to report back. We have tried your branch but hit afew issues, some of which we're not entirely sure are related.

First switching off minimum degree ordering, and then switching to theold version of aggressive coarsening, as you suggested, got us back tothe coarsening behaviour that we had previously, but then we alsoobserved an even further worsening of the iteration count: it hadpreviously gone up by 50% already (with the newer main petsc), but nowwas more than double "old" petsc. Took us a while to realize this wasdue to the default smoother changing from Cheby+SOR to Cheby+Jacobi.Switching this also back to the old default we get back to very similarcoarsening levels (see below for more details if it is of interest) anditeration counts.

So that's all very good news. However, we were also starting seeingmemory errors (double free or corruption) when we switched off theminimum degree ordering. Because this was at an earlier version of yourbranch we then rebuild, hoping this was just an earlier bug that hadbeen fixed, but then we were having MPI-lockup issues. We have nowfigured out the MPI issues are completely unrelated - some combinationwith a newer mpi build and firedrake on our cluster which also occurusing main branches of everything. So switching back to an older MPIbuild we are hoping to now test your most recent version ofadams/gamg-add-old-coarsening with these options and see whether thememory errors are still there. Will let you know


Best wishes
Stephan Kramer

Coarsening details with various options for Level 6 of the test case:

In our original setup (using "old" petsc), we had:

          rows=516, cols=516, bs=6
          rows=12660, cols=12660, bs=6
          rows=346974, cols=346974, bs=6
          rows=19169670, cols=19169670, bs=3

Then with the newer main petsc we had

          rows=666, cols=666, bs=6
          rows=7740, cols=7740, bs=6
          rows=34902, cols=34902, bs=6
          rows=736578, cols=736578, bs=6
          rows=19169670, cols=19169670, bs=3

Then on your branch with minimum_degree_ordering False:

          rows=504, cols=504, bs=6
          rows=2274, cols=2274, bs=6
          rows=11010, cols=11010, bs=6
          rows=35790, cols=35790, bs=6
          rows=430686, cols=430686, bs=6
          rows=19169670, cols=19169670, bs=3

And with minimum_degree_ordering False and use_aggressive_square_graph True:

          rows=498, cols=498, bs=6
          rows=12672, cols=12672, bs=6
          rows=346974, cols=346974, bs=6
          rows=19169670, cols=19169670, bs=3

So that is indeed pretty much back to what it was before








On 31/08/2023 23:40, Mark Adams wrote:

Hi Stephan,

This branch is settling down.  adams/gamg-add-old-coarsening
<https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
I made the old, not minimum degree, ordering the default but kept the new
"aggressive" coarsening as the default, so I am hoping that just adding
"-pc_gamg_use_aggressive_square_graph true" to your regression tests will
get you back to where you were before.
Fingers crossed ... let me know if you have any success or not.

Thanks,
Mark


On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <[email protected]> wrote:

Hi Stephan,

I have a branch that you can try: adams/gamg-add-old-coarsening
<https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>

Things to test:
* First, verify that nothing unintended changed by reproducing your bad
results with this branch (the defaults are the same)
* Try not using the minimum degree ordering that I suggested
with: -pc_gamg_use_minimum_degree_ordering false
   -- I am eager to see if that is the main problem.
* Go back to what I think is the old method:
-pc_gamg_use_minimum_degree_ordering
false -pc_gamg_use_aggressive_square_graph true

When we get back to where you were, I would like to try to get modern
stuff working.
I did add a -pc_gamg_aggressive_mis_k <2>
You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k
3

Anyway, lots to look at but, alas, AMG does have a lot of parameters.

Thanks,
Mark

On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <[email protected]> wrote:


On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <[email protected]>
wrote:

Many thanks for looking into this, Mark

My 3D tests were not that different and I see you lowered the

threshold.

Note, you can set the threshold to zero, but your test is running so

much

differently than mine there is something else going on.
Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot

for

in 3D.

So it is not clear what the problem is.  Some questions:

* do you have a picture of this mesh to show me?

It's just a standard hexahedral cubed sphere mesh with the refinement
level giving the number of times each of the six sides have been
subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16
layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 =
98304  hexes. And everything doubles in all 3 dimensions (so 2^3) going
to the next Level

I see, and I assume these are pretty stretched elements.

* what do you mean by Q1-Q2 elements?

Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity
and (tri)linear for pressure

I guess you could argue we could/should just do good old geometric
multigrid instead. More generally we do use this solver configuration a
lot for tetrahedral Taylor Hood (P2-P1) in particular also for our
adaptive mesh runs - would it be worth to see if we have the same
performance issues with tetrahedral P2-P1?

No, you have a clear reproducer, if not minimal.
The first coarsening is very different.

I am working on this and I see that I added a heuristic for thin bodies
where you order the vertices in greedy algorithms with minimum degree first.
This will tend to pick corners first, edges then faces, etc.
That may be the problem. I would like to understand it better (see below).

It would be nice to see if the new and old codes are similar without
aggressive coarsening.
This was the intended change of the major change in this time frame as

you

noticed.
If these jobs are easy to run, could you check that the old and new
versions are similar with "-pc_gamg_square_graph  0 ",  ( and you only

need

one time step).
All you need to do is check that the first coarse grid has about the

same

number of equations (large).

Unfortunately we're seeing some memory errors when we use this option,
and I'm not entirely clear whether we're just running out of memory and
need to put it on a special queue.

The run with square_graph 0 using new PETSc managed to get through one
solve at level 5, and is giving the following mg levels:

          rows=174, cols=174, bs=6
            total: nonzeros=30276, allocated nonzeros=30276
--
            rows=2106, cols=2106, bs=6
            total: nonzeros=4238532, allocated nonzeros=4238532
--
            rows=21828, cols=21828, bs=6
            total: nonzeros=62588232, allocated nonzeros=62588232
--
            rows=589824, cols=589824, bs=6
            total: nonzeros=1082528928, allocated nonzeros=1082528928
--
            rows=2433222, cols=2433222, bs=3
            total: nonzeros=456526098, allocated nonzeros=456526098

comparing with square_graph 100 with new PETSc

            rows=96, cols=96, bs=6
            total: nonzeros=9216, allocated nonzeros=9216
--
            rows=1440, cols=1440, bs=6
            total: nonzeros=647856, allocated nonzeros=647856
--
            rows=97242, cols=97242, bs=6
            total: nonzeros=65656836, allocated nonzeros=65656836
--
            rows=2433222, cols=2433222, bs=3
            total: nonzeros=456526098, allocated nonzeros=456526098

and old PETSc with square_graph 100

            rows=90, cols=90, bs=6
            total: nonzeros=8100, allocated nonzeros=8100
--
            rows=1872, cols=1872, bs=6
            total: nonzeros=1234080, allocated nonzeros=1234080
--
            rows=47652, cols=47652, bs=6
            total: nonzeros=23343264, allocated nonzeros=23343264
--
            rows=2433222, cols=2433222, bs=3
            total: nonzeros=456526098, allocated nonzeros=456526098
--

Unfortunately old PETSc with square_graph 0 did not complete a single
solve before giving the memory error

OK, thanks for trying.

I am working on this and I will give you a branch to test, but if you can
rebuild PETSc here is a quick test that might fix your problem.
In src/ksp/pc/impls/gamg/agg.c you will see:

     PetscCall(PetscSortIntWithArray(nloc, degree, permute));

If you can comment this out in the new code and compare with the old,
that might fix the problem.

Thanks,
Mark

BTW, I am starting to think I should add the old method back as an

option.

I did not think this change would cause large differences.

Yes, I think that would be much appreciated. Let us know if we can do
any testing

Best wishes
Stephan

Thanks,
Mark

Note that we are providing the rigid body near nullspace,
hence the bs=3 to bs=6.
We have tried different values for the gamg_threshold but it doesn't
really seem to significantly alter the coarsening amount in that first
step.

Do you have any suggestions for further things we should try/look at?
Any feedback would be much appreciated

Best wishes
Stephan Kramer

Full logs including log_view timings available from
https://github.com/stephankramer/petsc-scaling/

In particular:

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat

https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat

Re: [petsc-users] performance regression with GAMG

Reply via email to