I did some digging into the Trilinos reinit_matrix, and I've found
that all of my lost time is coming from the operation
graph->FillComplete(input_col_map, input_row_map)
This step is accounting for as much as 20 seconds per call for a
1x200,000 matrix on 4 processors. I also found that this bottleneck
is non-existent running in serial. There is a check earlier in
reinit_matrix which calls graph.reset differently for 1 processor vs.
multiple processors. The comments suggest that there's an explicit
reason for treating the mpi case this way, but that treatment seems
to be causing the problem for my case.
I think Martin Kronbichler might have to tell you why this is.
But if you're adventurous, take a look how the FillComplete function
in Trilinos looks like and whether you could possibly find the same
kinds of savings there that I found in deal.II. I suspect that it is
this function:
https://github.com/trilinos/Trilinos/blob/master/packages/epetra/src/Epetra_CrsGraph.cpp#L974
I'm not sure where exactly the time is spent, but I could imagine that
the most likely spot is where the code tries to sort an array of 200,000
entries with shell_sort which, if I remember it correctly, does not have
a short way out if the array is already sorted and is of O(n^2)
complexity. Not sure how one would approach this - one could suggest a
patch, but it might take a little while until the Trilinos folks react
(Epetra is not as actively developed any more, so there is little
resources for it).
Best,
Martin
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.