Hi, Greg We changed the default behavior to essentially assume folks were running with current MOFED/OFED drivers which allow one to register twice the amount of physical memory. If you are running OFED less than 2.0 or using older drivers, then you should set the following mca parameter:
-mca btl_openib_allow_max_memory_registration 0 This will tell the OpenIB BTL to actually calculate the max allowable based on driver specific values. Hope this helps, Josh On Tue, Mar 10, 2015 at 10:44 AM, Fischer, Greg A. < fisch...@westinghouse.com> wrote: > Hello, > > > > I’m trying to run the “connectivity_c” test on a variety of systems using > OpenMPI 1.8.4. The test returns segmentation faults when running across > nodes on one particular type of system, and only when using the openib BTL. > (The test runs without error if I stipulate “--mca btl tcp,self”.) Here’s > the output: > > > > 1033 fischega@bl1415[~/tmp/openmpi/1.8.4_test_examples_SLES11_SP2/error]> > mpirun -np 16 connectivity_c > > [bl1415:29526] *** Process received signal *** > > [bl1415:29526] Signal: Segmentation fault (11) > > [bl1415:29526] Signal code: (128) > > [bl1415:29526] Failing at address: (nil) > > [bl1415:29526] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x2ab1e72915d0] > > [bl1415:29526] [ 1] > /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_malloc+0x29e)[0x2ab1e7c550be] > > [bl1415:29526] [ 2] > /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_int_memalign+0x69)[0x2ab1e7c58829] > > [bl1415:29526] [ 3] > /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/libopen-pal.so.6(opal_memory_ptmalloc2_memalign+0x6f)[0x2ab1e7c583ff] > > [bl1415:29526] [ 4] > /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/openmpi/mca_btl_openib.so(+0x2867b)[0x2ab1eac8a67b] > > [bl1415:29526] [ 5] > /data/pgrlf/openmpi-1.8.4/SLES10_SP2_lib/lib/openmpi/mca_btl_openib.so(+0x1f712)[0x2ab1eac81712] > > [bl1415:29526] [ 6] /lib64/libpthread.so.0(+0x75f0)[0x2ab1e72895f0] > > [bl1415:29526] [ 7] /lib64/libc.so.6(clone+0x6d)[0x2ab1e757484d] > > [bl1415:29526] *** End of error message *** > > > > When I run the same test using a previous build of OpenMPI 1.6.5 on this > system, it returns a memory registration warning, but otherwise executes > normally: > > > > -------------------------------------------------------------------------- > > WARNING: It appears that your OpenFabrics subsystem is configured to only > > allow registering part of your physical memory. This can cause MPI jobs to > > run with erratic performance, hang, and/or crash. > > > > OpenMPI 1.8.4 does not seem to be reporting a memory registration warning > in situations where previous versions would report such a warning. Is this > because OpenMPI 1.8.4 is no longer vulnerable to this type of condition? > > > > Thanks, > > Greg > > ------------------------------ > This e-mail may contain proprietary information of the sending > organization. Any unauthorized or improper disclosure, copying, > distribution, or use of the contents of this e-mail and attached > document(s) is prohibited. The information contained in this e-mail and > attached document(s) is intended only for the personal and private use of > the recipient(s) named above. If you have received this communication in > error, please notify the sender immediately by email and delete the > original e-mail and attached document(s). > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/03/26448.php >