Hello, I'm trying to run the "connectivity_c" test on a variety of systems using OpenMPI 1.8.4. The test returns segmentation faults when running across nodes on one particular type of system, and only when using the openib BTL. (The test runs without error if I stipulate "--mca btl tcp,self".) Here's the output:
1033 fischega@bl1415[~/tmp/openmpi/1.8.4_test_examples_SLES11_SP2/error]> mpirun -np 16 connectivity_c [bl1415:29526] *** Process received signal *** [bl1415:29526] Signal: Segmentation fault (11) [bl1415:29526] Signal code: (128) [bl1415:29526] Failing at address: (nil) [bl1415:29526] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ab1e72915d0] [bl1415:29526] [ 1] /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_malloc+0x29e)[0x2ab1e7c550be] [bl1415:29526] [ 2] /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_memalign+0x69)[0x2ab1e7c58829] [bl1415:29526] [ 3] /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_memalign+0x6f)[0x2ab1e7c583ff] [bl1415:29526] [ 4] /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/openmpi/mca_btl_openib.so(+0x2867b)[0x2ab1eac8a67b] [bl1415:29526] [ 5] /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/openmpi/mca_btl_openib.so(+0x1f712)[0x2ab1eac81712] [bl1415:29526] [ 6] /lib64/libpthread.so.0(+0x75f0)[0x2ab1e72895f0] [bl1415:29526] [ 7] /lib64/libc.so.6(clone+0x6d)[0x2ab1e757484d] [bl1415:29526] *** End of error message *** When I run the same test using a previous build of OpenMPI 1.6.5 on this system, it returns a memory registration warning, but otherwise executes normally: -------------------------------------------------------------------------- WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash. OpenMPI 1.8.4 does not seem to be reporting a memory registration warning in situations where previous versions would report such a warning. Is this because OpenMPI 1.8.4 is no longer vulnerable to this type of condition? Thanks, Greg ________________________________ This e-mail may contain proprietary information of the sending organization. Any unauthorized or improper disclosure, copying, distribution, or use of the contents of this e-mail and attached document(s) is prohibited. The information contained in this e-mail and attached document(s) is intended only for the personal and private use of the recipient(s) named above. If you have received this communication in error, please notify the sender immediately by email and delete the original e-mail and attached document(s).