Dear Dustin,
How is your computational setup, i.e., how many nonzero entries do
you have in your matrix?
I'm not sure if I understand what you mean. Do you mean the number of
nonzero entries in /SparseMatrix/ or in the /BlockSparsityPattern/ or
in the dynamic one? How can I get this information?
You can call DynamicSparsityPattern::n_nonzero_elements() to get the
number of nonzero entries in the dynamic sparsity pattern. This method
also exists in BlockSparsityPattern (or all sparsity patterns that
inherit from BlockSparsityPatternBase):
https://dealii.org/developer/doxygen/deal.II/classBlockSparsityPatternBase.html
What I'm trying to understand here is what kind of properties your
problem has - whether there are many nonzero entries per row and other
special things that could explain your problems.
I just checked the 3D case of step-22 for the performance of
BlockSparsityPattern::copy_from(BlockDynamicSparsityPattern) and the
performance looks where I would expect it to be. It takes 1.19s to copy
the sparsity pattern for a case with 1.6m DoFs (I have some
modifications for the mesh compared to what you find online) on my
laptop. Given that there are 275m nonzero entries in that matrix and I
need to touch around 4.4 GB (= 4 x 275m x 4 bytes per unsigned int, once
for clearing the data in the pattern, once for reading in the dynamic
pattern, once for writing into the fixed pattern plus once for
write-allocate on that last operation) of memory here, I reach 26% of
the theoretical possible on this machine (~14 GB/s memory transfer per
core). While I would know how to reach more than 80% of peak memory
bandwidth here, this function is no way near being relevant in the
global run time in any of my performance profiles. And I'm likely the
deal.II person with most affinity to performance numbers.
Thus my interest in what is particular about your setup.
Have you checked that you do not run out of memory and see a large
swap time?
I'm quiet sure that this is not the case/problem since I used one of
our compute servers with 64 GB memory. Moreover, at the moment the
program runs with an additional global refinement, i.e. about 16
million dofs and only 33% of the memory is used. Swap isn't used at all.
That's good to know, so we can exclude the memory issue. Does your
program use multithreading? It probably does in case you do not do
anything special when configuring deal.II; the copy operation is not
parallelized by threads but neither are almost all other initialization
functions, so it should not become such a disproportionate timing here.
10h for 2.5m dofs looks insane. I would expect something between 0.5 and
10 seconds, depending on the number of nonzeros in those blocks.
Is there anything else special about your configuration or problem as
compared to the cases presented in the tutorial? What deal.II version
are you using, what is the finite element? Any special constraints on
those systems?
Unfortunately this can not be done that easy. I have to reorganize
things and kill a lot of superflous code. But besides that, I have a
lot of other work to do. May be I can provide you an example file at
the end of next week.
Let us know when you have a test case. I'm really curious what could
cause this huge run time.
Best,
Martin
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.