On Oct 14, 2020, at 3:07 AM, Diego Zuccato <diego.zucc...@unibo.it<mailto:diego.zucc...@unibo.it>> wrote:
Il 13/10/20 16:33, Jeff Squyres (jsquyres) ha scritto: That's odd. What version of Open MPI are you using? The version is 3.1.3 , as packaged in Debian Buster. The 3.1.x series is pretty old. If you want to stay in the 3.1.x series, you might try upgrading to the latest -- 3.1.6. That has a bunch of bug fixes compared to v3.1.3. Alternatively, the most recent release series is the v4.0.x series: v4.0.5 is the latest in that series. I don't know OpenMPI (or even MPI in general) much. Some time ago, I've had to add a mtl = psm2 line to /etc/openmpi/openmpi-mca-params.conf . This implies that you have Infinipath networking on your cluster. Another strangeness is that I've had the same problem on other nodes, that got "solved" (or, more likely, just "masked") by simply installing gdb: while trying to debug the issue I noticed that when I installed gdb I could no longer reproduce the problem. Too bad on this server gdb is already installed and apparently useless to debug the issue. I can't imagine what installing gdb would do to mask the problem. Strange. -- Jeff Squyres jsquy...@cisco.com<mailto:jsquy...@cisco.com>