Hi folks, recently we’ve been getting a Valgrind error in PMPI_Init for our suite of regression tests:
==5922== Invalid read of size 4 ==5922== at 0x61CC5C0: opal_os_dirpath_create (in /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2) ==5922== by 0x5F207E5: orte_session_dir (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x5F34F04: orte_ess_base_app_setup (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x7E96679: rte_init (in /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so) ==5922== by 0x5F12A77: orte_init (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x509883C: ompi_mpi_init (in /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) ==5922== by 0x50B843A: PMPI_Init (in /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) ==5922== by 0xEBA79C: ZFS::run() (in /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) ==5922== by 0x4DC243: main (in /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) ==5922== Address 0x710f670 is 48 bytes inside a block of size 51 alloc'd ==5922== at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==5922== by 0x61CC572: opal_os_dirpath_create (in /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2) ==5922== by 0x5F207E5: orte_session_dir (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x5F34F04: orte_ess_base_app_setup (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x7E96679: rte_init (in /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so) ==5922== by 0x5F12A77: orte_init (in /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) ==5922== by 0x509883C: ompi_mpi_init (in /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) ==5922== by 0x50B843A: PMPI_Init (in /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) ==5922== by 0xEBA79C: ZFS::run() (in /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) ==5922== by 0x4DC243: main (in /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) ==5922== What is weird is that it seems to depend on the pbs/torque session we’re in: sometimes the error does not occur and all and all tests run fine (this is in fact the only Valgrind error we’re having at the moment). Other times every single test we’re running has this error. Has anyone seen this or might be able to offer an explanation? If it is a false-positive, I’d be happy to suppress it :) Thanks a lot in advance Michael P.S.: This error is not covered/suppressed by the default ompi suppression file in $PREFIX/share/openmpi. -- Michael Schlottke-Lakemper SimLab Highly Scalable Fluids & Solids Engineering Jülich Aachen Research Alliance (JARA-HPC) RWTH Aachen University Wüllnerstraße 5a 52062 Aachen Germany Phone: +49 (241) 80 95188 Fax: +49 (241) 80 92257 Mail: m.schlottke-lakem...@aia.rwth-aachen.de<mailto:m.schlottke-lakem...@aia.rwth-aachen.de> Web: http://www.jara.org/jara-hpc