Hi, dear all, I'm back to this thread and discussion.

I recompiled 9.3.3 as Release with debug flag "-g". For a 3D system with
linear finite element (degree = 1), in which DoF is about 9.3*10^4, batch
job with --ntasks-per-node=128 --cpus-per-task=1 is about 10+ times
faster.

When I use degree = 2 finite element (uniform grid), DoF increases to
6.5*10^5, batch run with same tasks-cpu setup gains about 5 times speed up
(it is expected). However, the program crashes after two newton iterations
with error message:
"
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2795730.0. Some
of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cXXXX: task 40: Out Of Memory
srun: launch/slurm: _step_signal: Terminating StepId=2795730.0
slurmstepd: error: *** STEP 2795730.0 ON cXXXX CANCELLED AT
2023-09-XXTXX:XX:XX ***
slurmstepd: error:  mpi/pmix_v3: _errhandler: cXXXX [0]:
pmixp_client_v2.c:212: Error handler invoked: status = -25, source =
[slurm.pmix.2795730.0:40]
"
,where cXXXX is node index.

My first intuition for this is memory leak, then I try to run Valgrind, and
sadly noticed the Valgrind on the cluster was compiled with gcc 8.5, while
dealII was built with gcc 11.2 (gcc 8.5 ).has been removed.

So my questions here are (i) Did this issue ever happened for other deal.II
applications, how to solve it expect increase the number of nodes or memory
requirements; (ii) What kind of profiling/debugger tools nowaday's deal.II
experts are using to dress memory issue. Should I build Valgrind by myself?
Does Valgrind only support MPI 2, my openMPI is v.3.

Tim,
Sincerely


On Mon, Sep 18, 2023 at 3:47 AM Bruno Turcksin <bruno.turck...@gmail.com>
wrote:

> Timo,
>
> Yes, you want to profile the optimized library but you also want the debug
> info. Without it, the information given by the profiler usually makes
> little sense. So you compile in release mode but you use the following
> option when configuring your deal.II "-DCMAKE_CXX_FLAGS=-g"
>
> Best,
>
> Bruno
>
> Le sam. 16 sept. 2023 à 03:47, timo Hyvärinen <hyvarinentim...@gmail.com>
> a écrit :
>
>> Hi Bruno,
>>
>> Thank you for your explanations.
>>
>> Seemingly, I should compile an optimized lib then do profiling.
>>
>> Sincerely,
>> Timo
>>
>> On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin <bruno.turck...@gmail.com>
>> wrote:
>>
>>> Timo,
>>>
>>> You will get vastly different results in debug and release modes for two
>>> reasons. First, the compiler generates much faster code in release mode
>>> compared to debug. Second, there are a lot of checks inside deal.II that
>>> are only enabled in debug mode. This is great when you develop your code
>>> because it helps you catch bugs early but it makes your code much slower.
>>> In general, you want to develop your code in debug mode but your production
>>> run should be done in release.
>>>
>>> Best,
>>>
>>> Bruno
>>>
>>> On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:
>>>
>>> hi, Marc,
>>>
>>> Thank you for the reply.
>>>
>>> I compiled the lib with debug mode, didn't try the optimized version.
>>> I didn't think this could be a significant issue, but I infer optimized
>>> lib could improve performance alot based on your question.
>>>
>>> Sincerely,
>>> Timo
>>>
>>> On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling <mafe...@gmail.com> wrote:
>>>
>>> Hello Tim,
>>>
>>> > Yet, even though it is universally believed to be superior in terms
>>> of convergence properties, it is not widely used because it is often
>>> believed to be difficult to implement. One way to address this belief is to
>>> provide well-tested, easy to use software that provides this kind of
>>> functionality.
>>>
>>>
>>> Just to make sure: did you compile the deal.II library and your code in 
>>> Optimized
>>> mode/Release mode
>>> <https://www.dealii.org/current/readme.html#configuration>?
>>>
>>> Best,
>>> Marc
>>>
>>> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>>>
>>> Dear dealii community and developers,
>>>
>>> I have used dealii framework (9.3.x) a while on HPC machine. My project
>>> involves solving vector-valued nonlinear PDE with nine components.
>>> Currently, I've implemented damping newton iteration with GMRES+AMG
>>> preconditioner with MPI on distributed memory architecture.
>>>
>>> A simple timing tells me the assembly process of system-matrix takes 99%
>>> of the whole running time in every newton iteration. I guess there are
>>> a lot of idle cpu times during assembly because I don't take advantage
>>> of thread parallelism yet.
>>>
>>> So here is my question, which tutorial steps demonstrate how to
>>> implement the mpi-thread hybrid parallelism. I've found step-48 is talking
>>> about this, but
>>> I wonder are there any other tutorial programs to look at? I also wonder
>>> if any of you guys have suggestions about mpi+thread parallelism under
>>> dealii framework?
>>>
>>> Sincerely,
>>> Timo Hyvarinen
>>>
>>> --
>>>
>>> The deal.II project is located at http://www.dealii.org/
>>> For mailing list/forum options, see
>>> https://groups.google.com/d/forum/dealii?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "deal.II User Group" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dealii+un...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
>>> <https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> --
>>> The deal.II project is located at http://www.dealii.org/
>>> For mailing list/forum options, see
>>> https://groups.google.com/d/forum/dealii?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "deal.II User Group" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dealii+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com
>>> <https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0GfGYqH8zNEn9xG7UTn-Q56QHkN2VzLwh8bRosA5bxNUA%40mail.gmail.com.

Reply via email to