Hi, dear community and all developers,

Here is update from my side about the questions and issue I dressed earlier:

About profiling/debugger tools, I found this thread in 
maillist: https://groups.google.com/g/dealii/c/7_JJvipz0wY/m/aFU4pTuvAQAJ?hl=en.

About out of memory error, my current solution is undersbscribing node by 
splitting one task into 2 cpus, which will be sharing 4GB memory. This 
certainly half cuts performance but saves program from crash.

Hope you guys can give any kind of suggestions or comments.

Tim,
Sincerely
On Saturday, September 30, 2023 at 11:02:05 AM UTC+3 Tim hyvärinen wrote:

> Hi, dear all, I'm back to this thread and discussion.
>
> I recompiled 9.3.3 as Release with debug flag "-g". For a 3D system with 
> linear finite element (degree = 1), in which DoF is about 9.3*10^4, batch 
> job with --ntasks-per-node=128 --cpus-per-task=1 is about 10+ times 
> faster.  
>
> When I use degree = 2 finite element (uniform grid), DoF increases to 
> 6.5*10^5, batch run with same tasks-cpu setup gains about 5 times speed up 
> (it is expected). However, the program crashes after two newton iterations 
> with error message:
> "
> slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2795730.0. Some 
> of your processes may have been killed by the cgroup out-of-memory handler.
> srun: error: cXXXX: task 40: Out Of Memory
> srun: launch/slurm: _step_signal: Terminating StepId=2795730.0
> slurmstepd: error: *** STEP 2795730.0 ON cXXXX CANCELLED AT 
> 2023-09-XXTXX:XX:XX ***
> slurmstepd: error:  mpi/pmix_v3: _errhandler: cXXXX [0]: 
> pmixp_client_v2.c:212: Error handler invoked: status = -25, source = 
> [slurm.pmix.2795730.0:40]
> "
> ,where cXXXX is node index.
>
> My first intuition for this is memory leak, then I try to run Valgrind, 
> and sadly noticed the Valgrind on the cluster was compiled with gcc 8.5, 
> while dealII was built with gcc 11.2 (gcc 8.5 ).has been removed.
>
> So my questions here are (i) Did this issue ever happened for other 
> deal.II applications, how to solve it expect increase the number of nodes 
> or memory requirements; (ii) What kind of profiling/debugger tools 
> nowaday's deal.II experts are using to dress memory issue. Should I build 
> Valgrind by myself? Does Valgrind only support MPI 2, my openMPI is v.3.
>
> Tim,
> Sincerely
>
>
> On Mon, Sep 18, 2023 at 3:47 AM Bruno Turcksin <bruno.t...@gmail.com> 
> wrote:
>
>> Timo,
>>
>> Yes, you want to profile the optimized library but you also want the 
>> debug info. Without it, the information given by the profiler usually makes 
>> little sense. So you compile in release mode but you use the following 
>> option when configuring your deal.II "-DCMAKE_CXX_FLAGS=-g"
>>
>> Best,
>>
>> Bruno
>>
>> Le sam. 16 sept. 2023 à 03:47, timo Hyvärinen <hyvarin...@gmail.com> a 
>> écrit :
>>
>>> Hi Bruno,
>>>
>>> Thank you for your explanations.
>>>
>>> Seemingly, I should compile an optimized lib then do profiling. 
>>>
>>> Sincerely,
>>> Timo
>>>
>>> On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin <bruno.t...@gmail.com> 
>>> wrote:
>>>
>>>> Timo,
>>>>
>>>> You will get vastly different results in debug and release modes for 
>>>> two reasons. First, the compiler generates much faster code in release 
>>>> mode 
>>>> compared to debug. Second, there are a lot of checks inside deal.II that 
>>>> are only enabled in debug mode. This is great when you develop your code 
>>>> because it helps you catch bugs early but it makes your code much slower. 
>>>> In general, you want to develop your code in debug mode but your 
>>>> production 
>>>> run should be done in release.
>>>>
>>>> Best,
>>>>
>>>> Bruno
>>>>
>>>> On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:
>>>>
>>>> hi, Marc,
>>>>
>>>> Thank you for the reply.
>>>>
>>>> I compiled the lib with debug mode, didn't try the optimized version. 
>>>> I didn't think this could be a significant issue, but I infer optimized 
>>>> lib could improve performance alot based on your question. 
>>>>
>>>> Sincerely,
>>>> Timo
>>>>
>>>> On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling <mafe...@gmail.com> wrote:
>>>>
>>>> Hello Tim,
>>>>
>>>> > Yet, even though it is universally believed to be superior in terms 
>>>> of convergence properties, it is not widely used because it is often 
>>>> believed to be difficult to implement. One way to address this belief is 
>>>> to 
>>>> provide well-tested, easy to use software that provides this kind of 
>>>> functionality. 
>>>>
>>>>
>>>> Just to make sure: did you compile the deal.II library and your code in 
>>>> Optimized 
>>>> mode/Release mode 
>>>> <https://www.dealii.org/current/readme.html#configuration>?
>>>>
>>>> Best,
>>>> Marc
>>>>
>>>> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>>>>
>>>> Dear dealii community and developers,
>>>>
>>>> I have used dealii framework (9.3.x) a while on HPC machine. My project 
>>>> involves solving vector-valued nonlinear PDE with nine components.
>>>> Currently, I've implemented damping newton iteration with GMRES+AMG 
>>>> preconditioner with MPI on distributed memory architecture. 
>>>>
>>>> A simple timing tells me the assembly process of system-matrix takes 
>>>> 99% of the whole running time in every newton iteration. I guess there are
>>>> a lot of idle cpu times during assembly because I don't take advantage 
>>>> of thread parallelism yet.
>>>>
>>>> So here is my question, which tutorial steps demonstrate how to 
>>>> implement the mpi-thread hybrid parallelism. I've found step-48 is talking 
>>>> about this, but 
>>>> I wonder are there any other tutorial programs to look at? I also 
>>>> wonder if any of you guys have suggestions about mpi+thread parallelism 
>>>> under
>>>> dealii framework?
>>>>
>>>> Sincerely,
>>>> Timo Hyvarinen 
>>>>
>>>> -- 
>>>>
>>>> The deal.II project is located at http://www.dealii.org/
>>>> For mailing list/forum options, see 
>>>> https://groups.google.com/d/forum/dealii?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "deal.II User Group" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to dealii+un...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> -- 
>>>> The deal.II project is located at http://www.dealii.org/
>>>> For mailing list/forum options, see 
>>>> https://groups.google.com/d/forum/dealii?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "deal.II User Group" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to dealii+un...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/86ae0353-254a-474b-ba26-908d13df49cfn%40googlegroups.com.

Reply via email to