Dear Simon,
When you use FEPointEvaluation, you should construct it only once and
re-use the same object for different points. Furthermore, you should
also avoid to create "p_dofs" and the "std::vector" near the I was not
clear with my original message. Anyway, the problem is the FEValues
object that gets used. I am confused by your other message that you use
FE_Q together with MappingQ - that combination should be supported and
if it is not, we should take a look at a (reduced) code from you.
Regarding the high timings: There is some parallelization by tasks that
gets done inside the constructor of FEValues. This has good intents for
the case that we are in 3D and have a reasonable amount of work to do.
However, you are in 1D (if I read your code correctly), and then it is
having adverse effects. The reason is that the constructor of FEValues
is very likely completely dominated by memory allocation. When we have 1
thread, everything is fine, but when we have multiple threads working
they will start to interfere with each other when the request memory
through malloc(), which has to be coordinated by the operating system
(and thus gets slower). In fact, the big gap between compute time and
wall time shows that there is a lot of time wasted by "system time" that
does not do actual work on the cores.
I guess the library could have a better measure of when to spawn tasks
in FEValues in similar context, but it is a lot of work to get this
right. (This is why I keep avoiding it in critical functions.)
Best,
Martin
On 20.10.22 16:47, Simon Wiesheier wrote:
Update:
I profiled my program with valgrind --tool=callgrind and could figure
out that
FEPointEvaluation creates an FEValues object along with a quadrature
object under the hood.
Closer inspection revealed that all constructors, destructors,...
associated with FEPointEvaluation
need roughly 5000 instructions more (per call!).
That said, FEValues is indeed the faster approach, at least for FE_Q
elements.
export DEAL_II_NUM_THREADS=1
eliminated the gap between cpu and wall time.
Using FEValues directly, I get cpu time 19.8 seconds
and in the case of FEPointEvaluation cpu time = 21.9 seconds;
Wall times are in the same ballpark.
Out of curiosity, why produces multi-threading such high wall times
(200 seconds) in my case?.
These times are far too big given that the solution of the linear
system takes only about 13 seconds.
But based on what all of you have said, there is probably no other to
way to implement my problem.
Best
Simon
Am Do., 20. Okt. 2022 um 11:55 Uhr schrieb Simon Wiesheier
<simon.wieshe...@gmail.com>:
Dear Martin and Wolfgang,
" You seem to be looking for FEPointEvaluation. That class is
shown in step-19 and provides, for simple FiniteElement types, a
much faster way to evaluate solutions at arbitrary points within a
cell. Do you want to give it a try? "
I implemented the FEPointEvaluation approach like this:
FEPointEvaluation<1,1> fe_eval(mapping,
FE_Q<1>(1),
update_gradients | update_values);
fe_eval.reinit(cell,
make_array_view(std::vector<Point<1>>{ref_point_energy_vol}));
Vector<double> p_dofs(2);
cell->get_dof_values(solution_global, p_dofs);
fe_eval.evaluate(make_array_view(p_dofs),
EvaluationFlags::values | EvaluationFlags::gradients);
double val = fe_eval.get_value(0);
Tensor<1,1> grad = fe_eval.get_gradient(0);
I am using FE_Q elements of degree one and a MappingQ object also
of degree one.
Frankly, I do not really understand the measured computation times.
My program has several loadsteps with nested Newton iterations:
Loadstep 1:
Assembly 1: cpu time 12.8 sec wall time 268.7 sec
Assembly 2: cpu time 17.7 sec wall time 275.2 sec
Assembly 3: cpu time 22.3 sec wall time 272.6 sec
Assembly 4: cpu time 23.8 sec wall time 271.3sec
Loadstep 2:
Assembly 1: cpu time 14.3 sec wall time 260.0 sec
Assembly 2: cpu time 16.9 sec wall time 262.1 sec
Assembly 3: cpu time 18.5 sec wall time 270.6 sec
Assembly 4: cpu time 17.1 sec wall time 262.2 sec
...
Using FEValues instead of FEPointEvaluation, the results are:
Loadstep 1:
Assembly 1: cpu time 23.9 sec wall time 171.0 sec
Assembly 2: cpu time 32.5 sec wall time 168.9 sec
Assembly 3: cpu time 33.2 sec wall time 168.0 sec
Assembly 4: cpu time 32.7 sec wall time 166.9 sec
Loadstep 2:
Assembly 1: cpu time 24.9 sec wall time 168.0 sec
Assembly 2: cpu time 34.7 sec wall time 167.3 sec
Assembly 3: cpu time 33.9 sec wall time 167.8 sec
Assembly 4: cpu time 34.3 sec wall time 167.7 sec
...
Clearly, the fluctuations using FEValues are smaller than in case
of FEPointEvaluation.
Anyway, using FEPointEvaluation the cpu time is smaller but the
wall time substantially bigger.
If I am not mistaken, the values cpu time 34.3 sec and wall time
167.7 sec mean that
the cpu needs 34.3 sec to execute my assembly routine and has to
wait in the
remaining 167.7-34.3 seconds.
This huge gap between cpu and wall time has to be related to what
I do with FEValues or FEPointEvaluation
as cpu and wall time are nearly balanced if I use either neither
of them.
What might be the problem?
Best
Simon
Am Mi., 19. Okt. 2022 um 22:34 Uhr schrieb Wolfgang Bangerth
<bange...@colostate.edu>:
On 10/19/22 08:45, Simon Wiesheier wrote:
>
> What I want to do boils down to the following:
> Given the reference co-ordinates of a point 'p', along with
the cell on
> which 'p' lives,
> give me the value and gradient of a finite element function
evaluated at
> 'p'.
>
> My idea was to create a quadrature object with 'p' being the
only
> quadrature point and pass this
> quadrature object to the FEValues object and finally do the
> .reinit(cell) call (then, of course, get_function_values()...)
> 'p' is different for all (2.5 million) quadrature points,
which is why I
> create the FEValues object so many times.
It's worth pointing out that is exactly what
VectorTools::point_values()
does.
(As others have already mentioned, if you want to do that many
many
times over, this is too expensive and you should be using
FEPointEvaluation instead.)
Best
W.
--
------------------------------------------------------------------------
Wolfgang Bangerth email: bange...@colostate.edu
www:
http://www.math.colostate.edu/~bangerth/
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the
Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to dealii+unsubscr...@googlegroups.com
<mailto:dealii%2bunsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/cd1c8fa0-443d-b7bf-b433-f5ab033a247c%40colostate.edu.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google
Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/CAM50jEt3LjNS%2BF%3D2pGUSHRNbaHy9EzRfogpyBxb51-7M%3DALxrA%40mail.gmail.com
<https://groups.google.com/d/msgid/dealii/CAM50jEt3LjNS%2BF%3D2pGUSHRNbaHy9EzRfogpyBxb51-7M%3DALxrA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see
https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dealii/24ba8ef3-39f4-af6c-7296-02677827d6d3%40gmail.com.