Fantastic! I fixed a memory free problem. You should be OK now. I am pretty sure you are good but I would like to wait to get any feedback from you. We should have a release at the end of the month and it would be nice to get this into it.
Thanks, Mark On Fri, Sep 1, 2023 at 7:07 AM Stephan Kramer <[email protected]> wrote: > Hi Mark > > Sorry took a while to report back. We have tried your branch but hit a > few issues, some of which we're not entirely sure are related. > > First switching off minimum degree ordering, and then switching to the > old version of aggressive coarsening, as you suggested, got us back to > the coarsening behaviour that we had previously, but then we also > observed an even further worsening of the iteration count: it had > previously gone up by 50% already (with the newer main petsc), but now > was more than double "old" petsc. Took us a while to realize this was > due to the default smoother changing from Cheby+SOR to Cheby+Jacobi. > Switching this also back to the old default we get back to very similar > coarsening levels (see below for more details if it is of interest) and > iteration counts. > > So that's all very good news. However, we were also starting seeing > memory errors (double free or corruption) when we switched off the > minimum degree ordering. Because this was at an earlier version of your > branch we then rebuild, hoping this was just an earlier bug that had > been fixed, but then we were having MPI-lockup issues. We have now > figured out the MPI issues are completely unrelated - some combination > with a newer mpi build and firedrake on our cluster which also occur > using main branches of everything. So switching back to an older MPI > build we are hoping to now test your most recent version of > adams/gamg-add-old-coarsening with these options and see whether the > memory errors are still there. Will let you know > > Best wishes > Stephan Kramer > > Coarsening details with various options for Level 6 of the test case: > > In our original setup (using "old" petsc), we had: > > rows=516, cols=516, bs=6 > rows=12660, cols=12660, bs=6 > rows=346974, cols=346974, bs=6 > rows=19169670, cols=19169670, bs=3 > > Then with the newer main petsc we had > > rows=666, cols=666, bs=6 > rows=7740, cols=7740, bs=6 > rows=34902, cols=34902, bs=6 > rows=736578, cols=736578, bs=6 > rows=19169670, cols=19169670, bs=3 > > Then on your branch with minimum_degree_ordering False: > > rows=504, cols=504, bs=6 > rows=2274, cols=2274, bs=6 > rows=11010, cols=11010, bs=6 > rows=35790, cols=35790, bs=6 > rows=430686, cols=430686, bs=6 > rows=19169670, cols=19169670, bs=3 > > And with minimum_degree_ordering False and use_aggressive_square_graph > True: > > rows=498, cols=498, bs=6 > rows=12672, cols=12672, bs=6 > rows=346974, cols=346974, bs=6 > rows=19169670, cols=19169670, bs=3 > > So that is indeed pretty much back to what it was before > > > > > > > > > On 31/08/2023 23:40, Mark Adams wrote: > > Hi Stephan, > > > > This branch is settling down. adams/gamg-add-old-coarsening > > <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening> > > I made the old, not minimum degree, ordering the default but kept the new > > "aggressive" coarsening as the default, so I am hoping that just adding > > "-pc_gamg_use_aggressive_square_graph true" to your regression tests will > > get you back to where you were before. > > Fingers crossed ... let me know if you have any success or not. > > > > Thanks, > > Mark > > > > > > On Tue, Aug 15, 2023 at 1:45 PM Mark Adams <[email protected]> wrote: > > > >> Hi Stephan, > >> > >> I have a branch that you can try: adams/gamg-add-old-coarsening > >> <https://gitlab.com/petsc/petsc/-/commits/adams/gamg-add-old-coarsening > > > >> > >> Things to test: > >> * First, verify that nothing unintended changed by reproducing your bad > >> results with this branch (the defaults are the same) > >> * Try not using the minimum degree ordering that I suggested > >> with: -pc_gamg_use_minimum_degree_ordering false > >> -- I am eager to see if that is the main problem. > >> * Go back to what I think is the old method: > >> -pc_gamg_use_minimum_degree_ordering > >> false -pc_gamg_use_aggressive_square_graph true > >> > >> When we get back to where you were, I would like to try to get modern > >> stuff working. > >> I did add a -pc_gamg_aggressive_mis_k <2> > >> You could to another step of MIS coarsening with > -pc_gamg_aggressive_mis_k > >> 3 > >> > >> Anyway, lots to look at but, alas, AMG does have a lot of parameters. > >> > >> Thanks, > >> Mark > >> > >> On Mon, Aug 14, 2023 at 4:26 PM Mark Adams <[email protected]> wrote: > >> > >>> > >>> On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer < > [email protected]> > >>> wrote: > >>> > >>>> Many thanks for looking into this, Mark > >>>>> My 3D tests were not that different and I see you lowered the > >>>> threshold. > >>>>> Note, you can set the threshold to zero, but your test is running so > >>>> much > >>>>> differently than mine there is something else going on. > >>>>> Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot > >>>> for > >>>>> in 3D. > >>>>> > >>>>> So it is not clear what the problem is. Some questions: > >>>>> > >>>>> * do you have a picture of this mesh to show me? > >>>> It's just a standard hexahedral cubed sphere mesh with the refinement > >>>> level giving the number of times each of the six sides have been > >>>> subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 > >>>> layers. So the total number of elements at Level_5 is 6 x 32 x 32 x > 16 = > >>>> 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) > going > >>>> to the next Level > >>>> > >>> I see, and I assume these are pretty stretched elements. > >>> > >>> > >>>>> * what do you mean by Q1-Q2 elements? > >>>> Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity > >>>> and (tri)linear for pressure > >>>> > >>>> I guess you could argue we could/should just do good old geometric > >>>> multigrid instead. More generally we do use this solver configuration > a > >>>> lot for tetrahedral Taylor Hood (P2-P1) in particular also for our > >>>> adaptive mesh runs - would it be worth to see if we have the same > >>>> performance issues with tetrahedral P2-P1? > >>>> > >>> No, you have a clear reproducer, if not minimal. > >>> The first coarsening is very different. > >>> > >>> I am working on this and I see that I added a heuristic for thin bodies > >>> where you order the vertices in greedy algorithms with minimum degree > first. > >>> This will tend to pick corners first, edges then faces, etc. > >>> That may be the problem. I would like to understand it better (see > below). > >>> > >>> > >>> > >>>>> It would be nice to see if the new and old codes are similar without > >>>>> aggressive coarsening. > >>>>> This was the intended change of the major change in this time frame > as > >>>> you > >>>>> noticed. > >>>>> If these jobs are easy to run, could you check that the old and new > >>>>> versions are similar with "-pc_gamg_square_graph 0 ", ( and you > only > >>>> need > >>>>> one time step). > >>>>> All you need to do is check that the first coarse grid has about the > >>>> same > >>>>> number of equations (large). > >>>> Unfortunately we're seeing some memory errors when we use this option, > >>>> and I'm not entirely clear whether we're just running out of memory > and > >>>> need to put it on a special queue. > >>>> > >>>> The run with square_graph 0 using new PETSc managed to get through one > >>>> solve at level 5, and is giving the following mg levels: > >>>> > >>>> rows=174, cols=174, bs=6 > >>>> total: nonzeros=30276, allocated nonzeros=30276 > >>>> -- > >>>> rows=2106, cols=2106, bs=6 > >>>> total: nonzeros=4238532, allocated nonzeros=4238532 > >>>> -- > >>>> rows=21828, cols=21828, bs=6 > >>>> total: nonzeros=62588232, allocated nonzeros=62588232 > >>>> -- > >>>> rows=589824, cols=589824, bs=6 > >>>> total: nonzeros=1082528928, allocated nonzeros=1082528928 > >>>> -- > >>>> rows=2433222, cols=2433222, bs=3 > >>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>> > >>>> comparing with square_graph 100 with new PETSc > >>>> > >>>> rows=96, cols=96, bs=6 > >>>> total: nonzeros=9216, allocated nonzeros=9216 > >>>> -- > >>>> rows=1440, cols=1440, bs=6 > >>>> total: nonzeros=647856, allocated nonzeros=647856 > >>>> -- > >>>> rows=97242, cols=97242, bs=6 > >>>> total: nonzeros=65656836, allocated nonzeros=65656836 > >>>> -- > >>>> rows=2433222, cols=2433222, bs=3 > >>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>> > >>>> and old PETSc with square_graph 100 > >>>> > >>>> rows=90, cols=90, bs=6 > >>>> total: nonzeros=8100, allocated nonzeros=8100 > >>>> -- > >>>> rows=1872, cols=1872, bs=6 > >>>> total: nonzeros=1234080, allocated nonzeros=1234080 > >>>> -- > >>>> rows=47652, cols=47652, bs=6 > >>>> total: nonzeros=23343264, allocated nonzeros=23343264 > >>>> -- > >>>> rows=2433222, cols=2433222, bs=3 > >>>> total: nonzeros=456526098, allocated nonzeros=456526098 > >>>> -- > >>>> > >>>> Unfortunately old PETSc with square_graph 0 did not complete a single > >>>> solve before giving the memory error > >>>> > >>> OK, thanks for trying. > >>> > >>> I am working on this and I will give you a branch to test, but if you > can > >>> rebuild PETSc here is a quick test that might fix your problem. > >>> In src/ksp/pc/impls/gamg/agg.c you will see: > >>> > >>> PetscCall(PetscSortIntWithArray(nloc, degree, permute)); > >>> > >>> If you can comment this out in the new code and compare with the old, > >>> that might fix the problem. > >>> > >>> Thanks, > >>> Mark > >>> > >>> > >>>>> BTW, I am starting to think I should add the old method back as an > >>>> option. > >>>>> I did not think this change would cause large differences. > >>>> Yes, I think that would be much appreciated. Let us know if we can do > >>>> any testing > >>>> > >>>> Best wishes > >>>> Stephan > >>>> > >>>> > >>>>> Thanks, > >>>>> Mark > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Note that we are providing the rigid body near nullspace, > >>>>>> hence the bs=3 to bs=6. > >>>>>> We have tried different values for the gamg_threshold but it doesn't > >>>>>> really seem to significantly alter the coarsening amount in that > first > >>>>>> step. > >>>>>> > >>>>>> Do you have any suggestions for further things we should try/look > at? > >>>>>> Any feedback would be much appreciated > >>>>>> > >>>>>> Best wishes > >>>>>> Stephan Kramer > >>>>>> > >>>>>> Full logs including log_view timings available from > >>>>>> https://github.com/stephankramer/petsc-scaling/ > >>>>>> > >>>>>> In particular: > >>>>>> > >>>>>> > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > >>>>>> > >>>> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >>>>>> > >>>> > >
