Hi Stephan, I have a branch that you can try: adams/gamg-add-old-coarsening <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening>
Things to test: * First, verify that nothing unintended changed by reproducing your bad results with this branch (the defaults are the same) * Try not using the minimum degree ordering that I suggested with: -pc_gamg_use_minimum_degree_ordering false -- I am eager to see if that is the main problem. * Go back to what I think is the old method: -pc_gamg_use_minimum_degree_ordering false -pc_gamg_use_aggressive_square_graph true When we get back to where you were, I would like to try to get modern stuff working. I did add a -pc_gamg_aggressive_mis_k <2> You could to another step of MIS coarsening with -pc_gamg_aggressive_mis_k 3 Anyway, lots to look at but, alas, AMG does have a lot of parameters. Thanks, Mark On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <[email protected]> wrote: > > > On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <[email protected]> > wrote: > >> Many thanks for looking into this, Mark >> > My 3D tests were not that different and I see you lowered the threshold. >> > Note, you can set the threshold to zero, but your test is running so >> much >> > differently than mine there is something else going on. >> > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for >> > in 3D. >> > >> > So it is not clear what the problem is. Some questions: >> > >> > * do you have a picture of this mesh to show me? >> >> It's just a standard hexahedral cubed sphere mesh with the refinement >> level giving the number of times each of the six sides have been >> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 >> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = >> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) going >> to the next Level >> > > I see, and I assume these are pretty stretched elements. > > >> >> > * what do you mean by Q1-Q2 elements? >> >> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity >> and (tri)linear for pressure >> >> I guess you could argue we could/should just do good old geometric >> multigrid instead. More generally we do use this solver configuration a >> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our >> adaptive mesh runs - would it be worth to see if we have the same >> performance issues with tetrahedral P2-P1? >> > > No, you have a clear reproducer, if not minimal. > The first coarsening is very different. > > I am working on this and I see that I added a heuristic for thin bodies > where you order the vertices in greedy algorithms with minimum degree first. > This will tend to pick corners first, edges then faces, etc. > That may be the problem. I would like to understand it better (see below). > > > >> > >> > It would be nice to see if the new and old codes are similar without >> > aggressive coarsening. >> > This was the intended change of the major change in this time frame as >> you >> > noticed. >> > If these jobs are easy to run, could you check that the old and new >> > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only >> need >> > one time step). >> > All you need to do is check that the first coarse grid has about the >> same >> > number of equations (large). >> Unfortunately we're seeing some memory errors when we use this option, >> and I'm not entirely clear whether we're just running out of memory and >> need to put it on a special queue. >> >> The run with square_graph 0 using new PETSc managed to get through one >> solve at level 5, and is giving the following mg levels: >> >> rows=174, cols=174, bs=6 >> total: nonzeros=30276, allocated nonzeros=30276 >> -- >> rows=2106, cols=2106, bs=6 >> total: nonzeros=4238532, allocated nonzeros=4238532 >> -- >> rows=21828, cols=21828, bs=6 >> total: nonzeros=62588232, allocated nonzeros=62588232 >> -- >> rows=589824, cols=589824, bs=6 >> total: nonzeros=1082528928, allocated nonzeros=1082528928 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> >> comparing with square_graph 100 with new PETSc >> >> rows=96, cols=96, bs=6 >> total: nonzeros=9216, allocated nonzeros=9216 >> -- >> rows=1440, cols=1440, bs=6 >> total: nonzeros=647856, allocated nonzeros=647856 >> -- >> rows=97242, cols=97242, bs=6 >> total: nonzeros=65656836, allocated nonzeros=65656836 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> >> and old PETSc with square_graph 100 >> >> rows=90, cols=90, bs=6 >> total: nonzeros=8100, allocated nonzeros=8100 >> -- >> rows=1872, cols=1872, bs=6 >> total: nonzeros=1234080, allocated nonzeros=1234080 >> -- >> rows=47652, cols=47652, bs=6 >> total: nonzeros=23343264, allocated nonzeros=23343264 >> -- >> rows=2433222, cols=2433222, bs=3 >> total: nonzeros=456526098, allocated nonzeros=456526098 >> -- >> >> Unfortunately old PETSc with square_graph 0 did not complete a single >> solve before giving the memory error >> > > OK, thanks for trying. > > I am working on this and I will give you a branch to test, but if you can > rebuild PETSc here is a quick test that might fix your problem. > In src/ksp/pc/impls/gamg/agg.c you will see: > > PetscCall(PetscSortIntWithArray(nloc, degree, permute)); > > If you can comment this out in the new code and compare with the old, that > might fix the problem. > > Thanks, > Mark > > >> >> > >> > BTW, I am starting to think I should add the old method back as an >> option. >> > I did not think this change would cause large differences. >> >> Yes, I think that would be much appreciated. Let us know if we can do >> any testing >> >> Best wishes >> Stephan >> >> >> > >> > Thanks, >> > Mark >> > >> > >> > >> > >> >> Note that we are providing the rigid body near nullspace, >> >> hence the bs=3 to bs=6. >> >> We have tried different values for the gamg_threshold but it doesn't >> >> really seem to significantly alter the coarsening amount in that first >> >> step. >> >> >> >> Do you have any suggestions for further things we should try/look at? >> >> Any feedback would be much appreciated >> >> >> >> Best wishes >> >> Stephan Kramer >> >> >> >> Full logs including log_view timings available from >> >> https://github.com/stephankramer/petsc-scaling/ >> >> >> >> In particular: >> >> >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat >> >> >> >> >> https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat >> >> >> >> >> >>
