Nathan,unfortunately '--mca memory_linux_disable 1' does not help on this issue - it does not change the behaviour at all. Note that the pathological behaviour is present in Open MPI 2.0.2 as well as in /1.10.x, and Intel OmniPath (OPA) network-capable nodes are affected only.
The known workaround is to disable InfiniBand failback by '--mca btl ^tcp,openib' on nodes with OPA network. (On IB nodes, the same tweak lead to 5% performance improvement on single-node jobs; but obviously disabling IB on nodes connected via IB is not a solution for multi-node jobs, huh).
On 03/07/17 20:22, Nathan Hjelm wrote:
If this is with 1.10.x or older run with --mca memory_linux_disable 1. There is a bad interaction between ptmalloc2 and psm2 support. This problem is not present in v2.0.x and newer. -NathanOn Mar 7, 2017, at 10:30 AM, Paul Kapinos <kapi...@itc.rwth-aachen.de> wrote: Hi Dave,On 03/06/17 18:09, Dave Love wrote: I've been looking at a new version of an application (cp2k, for for what it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don'tWelcome to the club! :o) In our measures we see some 70% of time in 'mpi_free_mem'... and 15x performance loss if using Open MPI vs. Intel MPI. So it goes. https://www.mail-archive.com/users@lists.open-mpi.org//msg30593.htmlthink it did so the previous version I looked at. I found on an IB-based system it's spending about half its time in those allocation routines (according to its own profiling) -- a tad surprising. It turns out that's due to some pathological interaction with openib, and just having openib loaded. It shows up on a single-node run iff I don't suppress the openib btl, and doesn't for multi-node PSM runs iff I suppress openib (on a mixed Mellanox/Infinipath system).we're lucky - our issue is on Intel OmniPath (OPA) network (and we will junk IB hardware in near future, I think) - so we disabled the IB transport failback, --mca btl ^tcp,openib For single-node jobs this will also help on plain IB nodes, likely. (you can disable IB if you do not use it)Can anyone say why, and whether there's a workaround? (I can't easily diagnose what it's up to as ptrace is turned off on the system concerned, and I can't find anything relevant in archives.) I had the idea to try libfabric instead for multi-node jobs, and that doesn't show the pathological behaviour iff openib is suppressed. However, it requires ompi 1.10, not 1.8, which I was trying to use. _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915 _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
-- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, IT Center Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users