Hello, I am using the PRISMS-PF framework (which is based on deal.II) on the Skylake (skx) nodes (with 48 processors each) of the Stampede2 cluster.
I recently ran a series of strong scaling tests and noticed that the intra-node performance (i.e. 1 node, 1-48 processors) scales poorly, specifically the solver part. However, once I get past one node, the scaling is closer to ideal (taking 1 node as a reference). Here is the behavior I got (solver part only; in every case I used as many MPI threads as processors): Processors, Nodes, Solver time (s) 1, 1, 821 2 , 1, 608 4 , 1, 525 8, 1, 482 24, 1, 435 48, 1, 427 96, 2, 211 192, 4, 109 Does anyone know what may be the problem? The code uses the matrix-free method and requires only the p4est and mpi libraries, which I included as dependencies when I did cmake to install deal.II. Here is the line I used cmake -DDEAL_II_WITH_MPI=ON -DDEAL_II_WITH_P4EST=ON -DCMAKE_INSTALL_PREFIX=$WORK/dealii_install $WORK/dealii-9.2.0 Am I perhaps missing a flag? By the way, the home nodes (which I used to install deal.II and compile my code) are also Skylake, so I would expect my code to have a good performance. I do not observe the same issue elsewhere (e. g., on my local machine or on the KNL nodes on Cori). Any help that might help me figure out this issue is appreciated. Best, David -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/8bd2837d-c284-4f1e-a194-ad4a56835cb6n%40googlegroups.com.