Do you have both mpich and openmpi installed ?
yes
can you give the results of "dpkg -l | grep mpich " and ditto for openmpi?
root@capitanata:~# dpkg -l | grep mpich
ii libmpich-dev:amd64
3.3~b2-7+b1 amd64
Development files for MPICH
ii libmpich12:amd64
3.3~b2-7+b1 amd64 Shared
libraries for MPICH
ii mpich
3.3~b2-7+b1 amd64
Implementation of the MPI Message Passing Interface standard
root@capitanata:~# dpkg -l | grep openmpi
ii gromacs-openmpi 2018.2-2
amd64 Molecular dynamics
sim, binaries for OpenMPI parallelization
ii libhdf5-openmpi-100:amd64
1.10.0-patch1+docs-4+b2 amd64
Hierarchical Data Format 5 (HDF5) - runtime files - OpenMPI version
ii libhdf5-openmpi-dev
1.10.0-patch1+docs-4+b2 amd64
Hierarchical Data Format 5 (HDF5) - development files - OpenMPI version
ii libmkl-blacs-openmpi-ilp64:amd64
2018.3.222-1 amd64 IntelĀ® MKL
: ILP64 version of BLACS routines for Open MPI
ii libmkl-blacs-openmpi-lp64:amd64
2018.3.222-1 amd64 IntelĀ® MKL
: LP64 version of BLACS routines for Open MPI
ii libopenmpi-dev:amd64
3.1.1.real-4+b1 amd64 high
performance message passing library -- header files
ii libopenmpi3:amd64
3.1.1.real-4+b1 amd64 high
performance message passing library -- shared library
ii libscalapack-openmpi-dev 2.0.2-7+b1
amd64 Scalable Linear
Algebra Package - Dev files for OpenMPI
ii libscalapack-openmpi2.0 2.0.2-7+b1
amd64 Scalable Linear
Algebra Package - Shared libs for OpenMPI
ii mpqc-openmpi 2.3.1-18
all Massively Parallel
Quantum Chemistry Program (OpenMPI transitional package)
ii openmpi-bin
3.1.1.real-4+b1 amd64 high
performance message passing library -- binaries
ii openmpi-common
3.1.1.real-4+b1 amd64 high
performance message passing library -- common files
ii openmpi-doc
3.1.1.real-4 all high
performance message passing library -- man pages
ii yorick-mpy-openmpi
2.2.04+dfsg1-9+b1 amd64 Message
Passing Yorick (OpenMPI build)
The "alternatives" system may be confused. Check where the symlinks for
/usr/bin/mpiexec, mpirun lead.
This seems ok, apparently:
root@capitanata:~# ls -l /etc/alternatives/mpirun
lrwxrwxrwx 1 root root 23 apr 21 17:09 /etc/alternatives/mpirun ->
/usr/bin/mpirun.openmpi
root@capitanata:~# ls -l /etc/alternatives/mpiexec
lrwxrwxrwx 1 root root 24 apr 21 17:09 /etc/alternatives/mpiexec ->
/usr/bin/mpiexec.openmpi
Try testing with mpiexec.openmpi explicitly rather than mpiexec.
I had done it already, and anyway mpiexec points to mpiexec.openmpi. No
change.
For the transport, try:
$ mpirun.openmpi -n 2 --mca btl self,tcp ./printf
yay! this worked. My bare bones test code with that runs flawlessly:
gmulas@capitanata:~/PAHmodels/anharmonica-scalapack$ mpiexec.openmpi ---mca btl
self,tcp sample_printf
MPI_Init call ok
My rank is = 0
number of procs is = 2
MPI_Init call ok
My rank is = 1
number of procs is = 2
MPI_Finalize call ok, returned 0
MPI_Finalize call ok, returned 0
The same code, run without the --mca option, yields:
gmulas@capitanata:~/PAHmodels/anharmonica-scalapack$ mpiexec.openmpi -n 2
sample_printf
--------------------------------------------------------------------------
[[23445,1],1]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:
Module: OpenFabrics (openib)
Host: capitanata
Another transport will be used instead, although this may result in
lower performance.
NOTE: You can disable this warning by setting the MCA parameter
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
and then hangs there forever.
Now, I think it has _always_ been the default of openmpi to try to use
infiniband first, if available, and then fall back on slower tcp. The
question is: why does it apparently hang on trying to use some faster
interconnection instead of gracefully faling and moving on to the next
available slower one, as it did before?
Second question, a practical one: how should I configure mpiexec.openmpi so
that it uses self, tcp by default directly, when called without arguments?
This would at least make openmpi usable (with some configuration) and demote
the bug from grave to important or even normal, perhaps putting some info
about this problem and how to deal with it in a README.debian file.
Of course it's a workaround, not a real solution, but way better than
nothing :)
thanks!
Giacomo
--
_________________________________________________________________
Giacomo Mulas <[email protected]>
_________________________________________________________________
INAF - Osservatorio Astronomico di Cagliari
via della scienza 5 - 09047 Selargius (CA)
tel. +39 070 71180255
mob. : +39 329 6603810
_________________________________________________________________
"When the storms are raging around you, stay right where you are"
(Freddy Mercury)
_________________________________________________________________