Sure - for example, if you intend to run 4 threads, then —map-by core:pe=4
(assuming you are running OMPI 1.10 or higher) will bind each process to 4
cores in a disjoint pattern (i.e., no sharing).
> On Jun 22, 2016, at 3:37 AM, Gilles Gouaillardet
> wrote:
>
> my point is the way I (almost)
my point is the way I (almost) always use it is
export KMP_AFFINITY=compact,granularity=fine
the trick is I rely on OpenMPI and/or the batch manager to pin MPI tasks on
disjoint core sets.
that is obviously not the case with
mpirun --bind-to none ...
but that can be achieved with the appropriate
KMP_AFFINITY is essential for performance. One just needs to set it to
something that distributes the threads properly.
Not setting KMP_AFFINITY means no affinity and thus inheriting from process
affinity mask.
Jeff
On Wednesday, June 22, 2016, Gilles Gouaillardet wrote:
> my bad, I was assum
my bad, I was assuming KMP_AFFINITY was used
so let me put it this way :
do *not* use KMP_AFFINITY with mpirun -bind-to none, otherwise, you will
very likely end up doing time sharing ...
Cheers,
Gilles
On 6/22/2016 5:07 PM, Jeff Hammond wrote:
Linux should not put more than one thread
Linux should not put more than one thread on a core if there are free
cores. Depending on cache/bandwidth needs, it may or may not be better to
colocate on the same socket.
KMP_AFFINITY will pin the OpenMP threads. This is often important for MKL
performance. See https://software.intel.com/en-u
Remi,
Keep in mind this is still suboptimal.
if you run 2 tasks per node, there is a risks threads from different
ranks end up bound to the same core, which means time sharing and a drop
in performance.
Cheers,
Gilles
On 6/22/2016 4:45 PM, remi marchal wrote:
Dear Gilles,
Thanks a lo
Dear Gilles,
Thanks a lot.
The mpirun --bind-to-none solve the problem.
Thanks a lot,
Regards,
Rémi
> Le 22 juin 2016 à 09:34, Gilles Gouaillardet a écrit :
>
> Remi,
>
>
> in the same environment, can you
>
> mpirun -np 1 grep Cpus_allowed_list /proc/self/status
>
>
> it is likely
Remi,
in the same environment, can you
mpirun -np 1 grep Cpus_allowed_list /proc/self/status
it is likely Open MPI allows only one core, and in this case, i suspect
MKL refuses to do some time sharing and hence transparently reduce the
number of threads to 1.
/* unless it *does* time sharin
Do you know for sure that MKL is only using one thread or do you merely see
that the performance is consistent with it using one thread?
If MPI does process pinning, it is possible for all OpenMP threads to run
on one core, which means one will observe no speedup from threads (and
potentially a sl
Dear openmpi users,
Today, I faced a strange problem.
I am compiling a quantum chemistry software (CASTEP-16) using intel16, mkl
threaded libraries and openmpi-18.1.
The compilation works fine.
When I ask for MKL_NUM_THREAD=4 and call the program in serial mode (without
mpirun), it works perf
10 matches
Mail list logo