Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-23 Thread Simon Wiesheier
Sorry, I was wrong. Of course, it is the other way round. The fast one is 3 times faster. -Simon Am So., 23. Okt. 2022 um 10:37 Uhr schrieb Peter Munch < peterrmue...@gmail.com>: > Now, I am lost. The fast one is 3 times slower!? > > Peter > > On Sunday, 23 October 2022 at 10:33:38 UTC+2 Simon w

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-23 Thread Peter Munch
Now, I am lost. The fast one is 3 times slower!? Peter On Sunday, 23 October 2022 at 10:33:38 UTC+2 Simon wrote: > Certainly. > When using the slow path, i.e. MappingQ in version 9.3.2, the cpu time is > about 6.3 seconds. > In case of the fast path, i.e. MappingQGeneric in version 9.3.2, the

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-23 Thread Simon Wiesheier
Certainly. When using the slow path, i.e. MappingQ in version 9.3.2, the cpu time is about 6.3 seconds. In case of the fast path, i.e. MappingQGeneric in version 9.3.2, the cpu time is about 18.7 seconds. Crudely, the .reinit function associated with the FEPointEvaluation objects is called about 1

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-22 Thread Peter Munch
Happy about that! May I ask you to post the results here. I am curious since I never actually compared timings (and only blindly trusted Martin). Thanks, Peter On Saturday, 22 October 2022 at 16:46:16 UTC+2 Simon wrote: > Yes, the issue is resolved and the computation time decreased > signific

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-22 Thread Simon Wiesheier
Yes, the issue is resolved and the computation time decreased significantly. Thank you all! -Simon Am Sa., 22. Okt. 2022 um 12:57 Uhr schrieb Peter Munch < peterrmue...@gmail.com>: > You are right. Release 9.3 uses the slow path for MappingQ. The reason is > that here > https://github.com/deali

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-22 Thread Peter Munch
You are right. Release 9.3 uses the slow path for MappingQ. The reason is that here https://github.com/dealii/dealii/blob/ccfaddc2bab172d9d139dabc044d028f65bb480a/include/deal.II/matrix_free/fe_point_evaluation.h#L708-L711 we check for MappingQGeneric. At that time MappingQ and MappingQGeneric

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-21 Thread Simon Wiesheier
I revised the appendix from my last message a little bit and attache now a minimal working example (just 140 lines) along with a CMakeLists.txt. After checking the profiling results from valgrind, the combination of MappingQ with FE_Q takes *not* the fast path. For info: I use dealii version 9.3.2

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Simon Wiesheier
" When you use FEPointEvaluation, you should construct it only once and re-use the same object for different points. Furthermore, you should also avoid to create "p_dofs" and the "std::vector" near the I was not clear with my original message. Anyway, the problem is the FEValues object that gets u

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Martin Kronbichler
Dear Simon, When you use FEPointEvaluation, you should construct it only once and re-use the same object for different points. Furthermore, you should also avoid to create "p_dofs" and the "std::vector" near the  I was not clear with my original message. Anyway, the problem is the FEValues ob

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Simon Wiesheier
" What type of Mapping are you using? If you take a look at https://github.com/dealii/dealii/blob/ad13824e599601ee170cb2fd1c7c3099d3d5b0f7/source/matrix_free/fe_point_evaluation.cc#L40-L95 you can see when the fast path of FEPointEvaluation is taken. Indeed, the slow path is (FEValues). One questio

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Peter Munch
> FEPointEvaluation creates an FEValues object along with a quadrature object under the hood. Closer inspection revealed that all constructors, destructors,... associated with FEPointEvaluation need roughly 5000 instructions more (per call!). That said, FEValues is indeed the faster approach, at

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Simon Wiesheier
Update: I profiled my program with valgrind --tool=callgrind and could figure out that FEPointEvaluation creates an FEValues object along with a quadrature object under the hood. Closer inspection revealed that all constructors, destructors,... associated with FEPointEvaluation need roughly 5000 i

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-20 Thread Simon Wiesheier
Dear Martin and Wolfgang, " You seem to be looking for FEPointEvaluation. That class is shown in step-19 and provides, for simple FiniteElement types, a much faster way to evaluate solutions at arbitrary points within a cell. Do you want to give it a try? " I implemented the FEPointEvaluation app

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Wolfgang Bangerth
On 10/19/22 08:45, Simon Wiesheier wrote: What I want to do boils down to the following: Given the reference co-ordinates of a point 'p', along with the cell on which 'p' lives, give me the value and gradient of a finite element function evaluated at 'p'. My idea was to create a quadrature o

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Martin Kronbichler
Dear Simon, You seem to be looking for FEPointEvaluation. That class is shown in step-19 and provides, for simple FiniteElement types, a much faster way to evaluate solutions at arbitrary points within a cell. Do you want to give it a try? The issue you are facing is that FEValues that you are usi

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Simon Wiesheier
" It's an environment variable. " I did $DEAL_II_NUM_THREADS and the variable is not set. But if it were set to one, why would this explain the gap between cpu and wall time? " My point is the constructor should not be called millions of times. You are not going to be able to get that function 10

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Bruno Turcksin
Simon, Le mer. 19 oct. 2022 à 09:33, Simon Wiesheier a écrit : > Thank you for your answer! > > " Did you set DEAL_II_NUM_THREADS=1?" > > How can I double-check that? > ccmake . > only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR . > But I do do knot if this is the right place to loo

Re: [deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Simon Wiesheier
Thank you for your answer! " Did you set DEAL_II_NUM_THREADS=1?" How can I double-check that? ccmake . only shows my the variables CMAKE_BUILD_TYPE and deal.II_DIR . But I do do knot if this is the right place to look for. " That could explain why CPU and Wall time are different. Finally, if I

[deal.II] Re: measuring cpu and wall time for assembly routine

2022-10-19 Thread Bruno Turcksin
Simon, The best way to profile a code is to use a profiler. It can give a lot more information than what simple timers can do. You say that your code is not parallelized but by default deal.II is multithreaded . Did you set DEAL_II_NUM_THREADS=1? That could explain why CPU and Wall time are di