On 2/27/14 16:47 PM, Dave Love wrote:
Bernd Dammann <b...@cc.dtu.dk> writes:
Hi,
I found this thread from before Christmas, and I wondered what the
status of this problem is. We experience the same problems since our
upgrade to Scientific Linux 6.4, kernel 2.6.32-431.1.2.el6.x86_64, and
OpenMPI 1.6.5.
Users have reported severe slowdowns in all kinds of applications,
like VASP, OpenFOAM, etc.
I'm surprised a kernel change should be related to core binding, if
that's the issue, or caused your slowdown. We were running that kernel
OK until recently with those sort of applications and that OMPI version.
Maybe I should say, that we moved from SL 6.1 and OMPI 1.4.x to SL 6.4
with the above kernel, and OMPI 1.6.5 - which means a major upgrade of
our cluster.
After the upgrade, users reported those slowdowns, and a search on this
list showed, that other sites had the same (or similar issues) with this
kernel and OMPI version combination.
(The change to the default alltoallv collective algorithm in the OMPI
1.6 series, discussed in the archives, might affect you if you upgraded
through it.)
OK, thanks - I take a look at it.
Using the workaround '--bind-to-core' does only make sense for those
jobs, that allocate full nodes, but the majority of our jobs don't do
that.
I don't consider it a workaround. Just use a resource manager that
sorts it out for you. For what it's worth, a recipe for SGE/OMPI is at
<http://arc.liv.ac.uk/SGE/howto/sge-configs.html#_core_binding>. We're
happy with that (and seem to be at least on a par with Intel using
OMPI+GCC+OpenBLAS) now users automatically get binding.
We use Moab/Torque, so we could use cpusets (but that has had some other
side effects earlier, so we did not implement it in our setup).
Regardless of that, it looks strange to me, that this combination of
kernel and OMPI has such a negative side effect on application performance.
Rgds,
Bernd