Fabian Wein <fabian.w...@fau.de> writes: > There is an old OpenFOAM installation which includes and old open-mpi, > this might > cause the trouble.
OpenFOAM should definitely be built against the system MPI (and, in general, you should avoid bundled libraries wherever possible IMHO). > I also suspect that sourcing the Intel 2016 > compilers somehow disturbs. I don't know, but the Intel compiler is a definite source of trouble, particularly because of the myths around it. I've fixed a fair number of problems for the users who will listen with "Use GCC and Open MPI". > I don’t know how to check if hwloc supports numa, sockets, … But if I > configure 1.11.1 I see on in the configure output. Therefore I build > it manually. I don't know what the bundled version builds, but if it builds the utilities, running the hwloc-ps program under hwloc-bind is a way to test it. That doesn't verify the mpi installation, though. # hwloc-bind node1:1 hwloc-ps | grep hwloc 13425 NUMANode:1 hwloc-ps # grep -m1 model\ name /proc/cpuinfo model name : AMD Opteron(TM) Processor 6276 Running hwloc-ps under mpirun should show the default binding anyway. >>> but it does not bring me the performance I expect for the petsc benchmark. >> >> Without a sane installation it's probably irrelevant, but performance >> relative to what? Anyhow, why don't you want to bind to cores, or at >> least L2 cache, if that’s shared? > > I compare the performance of the petsc stream benchmark with a similar but > older > 4 packages 24 cores opteron system and there -bind-to numa results in > a significant > increase in performance. I don't know what that benchmark is, but if it's like the canonical Stream benchmark, that's surprising. I still don't understand why you wouldn't want to bind to the lowest level possible. (lstopo shows that the system above has 2MB L2 for pairs of cores and 6M L3 for four pairs on the NUMAnode.) > Anyhow, I finally managed to compile mpich (there were issues with the > intel compilers) and mpich allows bindings on my system. [I think it also uses hwloc.] > I still have > to find out the optimal binding/ mapping, simply binding to numa as in > the other system doesn’t work but the topology is different. I’m a > user and new to MPI, I still have to learn a lot. There is tutorial material on locality and hwloc under <https://www.open-mpi.org/projects/hwloc/> that looks as good as I'd expect.