Good morning, I'm struggling with the setup of openmpi-1.6.3 on top of Debian wheezy/testing and mellanox/ofed/mlx4 memory pinning- cluster equipped with Mellanox HCAs MT26428, Debian 3.2.35-2 x86_64, 4x8core AMD Opteron 6212, 128G Memory.
I'm aware of the FAQ entries about mlx4_core module parameters (log_num_mtt etc.) but the module in Debian kernels (resp. kernels from kernel.org up to recent 3.8) does not know anything about log_num_mtt. This parameter is only available in the OFED rpms for SLES/RHEL/OEL. Jobs started with the the default environment do fail (log_mtts_per_seg is a valid parameter in mxl4_core/Debian kernel and set to 3; log_num_mtt is not a valid parameter of mxl4_core and set to 20 in btl_openib.c, ...Your MPI job will continue, but may be behave poorly and/or hang..., a simple benchmark will run for hours instead of returning a result after a few minutes, on the same hardware -Debian Squeeze and openmpi-1.4.5- this job runs flawlessly) Is there a way to tell openmpi-1.6.3 to use the ofed-module from vanilla kernel and not to rely on log_num_mtt for "do-we-have-enough-registred-mem" computation for Mellanox HCAs? Any other idea/hint? MfG/Sincerely, Stefan Friedel -- IWR * 523 * INF 368 * 69120 Heidelberg T +49 6221 548240 * F +49 6221 545224 stefan.frie...@iwr.uni-heidelberg.de
signature.asc
Description: Digital signature