Looking at the code, we do in fact set the memalign_threshold = eager_limit by default, but only if you configured with —enable-btl-openib-malloc-alignment AND/OR we found the malloc hook functions were available.
You might check config.log to see if the openib malloc hooks were enabled. My guess is that they weren’t, for some reason. > On May 24, 2015, at 9:07 AM, Xavier Besseron <xavier.besse...@uni.lu> wrote: > > Dear OpenMPI developers / users, > > This is much more a comment than a question since I believe I have already > solved my issue. But I would like to report it. > > I have noticed my code performed very badly with OpenMPI when Infinand is > enabled, sometime +50% or even +100% overhead. > I also have this slowdown when running with one thread and one process. In > such case, there is no other MPI call than MPI_Init() and MPI_Finalize(). > This overhead disappears if I disable at runtime the openib btl, ie with > '--mca btl ^openib'. > After further investigation, I figured out it comes from the memory allocator > which is aligning every memory allocation when Infiniband is used. > This makes sense because my code is a large irregular C++ code creating and > deleting many objects. > > Just below is the documentation of the relevant MCA parameters coming > ompi_info: > > MCA btl: parameter "btl_openib_memalign" (current value: "32", data source: > default, level: 9 dev/all, type: int) > [64 | 32 | 0] - Enable (64bit or 32bit)/Disable(0) memoryalignment > for all malloc calls if btl openib is used. > > MCA btl: parameter "btl_openib_memalign_threshold" (current value: "0", data > source: default, level: 9 dev/all, type: size_t) > Allocating memory more than btl_openib_memalign_threshholdbytes will > automatically be algined to the value of btl_openib_memalign > bytes.memalign_threshhold defaults to the same value as > mca_btl_openib_eager_limit. > > MCA btl: parameter "btl_openib_eager_limit" (current value: "12288", data > source: default, level: 4 tuner/basic, type: size_t) > Maximum size (in bytes, including header) of "short" messages (must > be >= 1). > > > In the end, the problem is that the default value for > btl_openib_memalign_threshold is 0, which means that all memory allocations > are aligned to 32 bits. > The documentation says that the default value of > btl_openib_memalign_threshold should be the the same as > btl_openib_eager_limit, ie 12288 instead of 0. > > On my side, changing btl_openib_memalign_threshold to 12288 fixes my > performance issue. > However, I believe that the default value of btl_openib_memalign_threshold > should be fixed in the OpenMPI code (or at least the documentation should be > fixed). > > I tried OpenMPI 1.8.5, 1.7.3 and 1.6.4 and it's all the same. > > > Bonus question: > As this issue might impact other users, I'm considering applying a global fix > on our clusters by setting this default value etc/openmpi-mca-params.conf. > Do you see any good reason not doing it? > > Thank you for your comments. > > Best regards, > > Xavier > > > -- > Dr Xavier BESSERON > Research associate > FSTC, University of Luxembourg > Campus Kirchberg, Office E-007 > Phone: +352 46 66 44 5418 > http://luxdem.uni.lu/ <http://luxdem.uni.lu/> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/05/26913.php