Hello, 

I am using the PRISMS-PF framework (which is based on deal.II) on the 
Skylake (skx) nodes (with 48 processors each) of the Stampede2 cluster. 

I recently ran a series of strong scaling tests and noticed that the 
intra-node performance (i.e. 1 node, 1-48 processors) scales poorly, 
specifically the solver part. However, once I get past one node, the 
scaling is closer to ideal (taking 1 node as a reference).

Here is the behavior I got (solver part only; in every case I used as many 
MPI threads as processors):

Processors, Nodes, Solver time (s)
1, 1, 821
2 , 1, 608
4 , 1,  525          
8, 1, 482
24, 1, 435 
48, 1, 427
96, 2, 211
192, 4, 109

Does anyone know what may be the problem?

The code uses  the matrix-free method and requires only the p4est and mpi 
libraries, which I included as dependencies when I did cmake to install 
deal.II.

Here is the line I used
 cmake -DDEAL_II_WITH_MPI=ON -DDEAL_II_WITH_P4EST=ON
 -DCMAKE_INSTALL_PREFIX=$WORK/dealii_install $WORK/dealii-9.2.0

Am I perhaps missing a flag?

By the way, the home nodes (which I used to install deal.II and compile my 
code) are also Skylake, so I would expect my code to have a good 
performance. 

I do not observe the same issue elsewhere (e. g., on my local machine or on 
the KNL nodes on Cori). 

Any help that might help me figure out this issue is appreciated. 

Best,

David 




-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/8bd2837d-c284-4f1e-a194-ad4a56835cb6n%40googlegroups.com.

Reply via email to