OMP_NUM_THREADS=1 mpiexec -n 1 gnu_openmpi_a/one_c_prof.exe : 113 iterations
OMP_NUM_THREADS=6 mpiexec -n 1 --map-by node:PE=6 : 639 iterations
OMP_NUM_THREADS=6 mpiexec -n 2 --map-by node:PE=6 : 639 iterations
OMP_NUM_THREADS=12 mpiexec -n 1 --map-by node:PE=12 : 1000 iterations
OMP_NUM_THREADS=12 mpiexec -n 2 --use-hwthread-cpus --map-by node:PE=12 :
646 iterations

that's looking better, with limited gain for 1 process on 2 chips. Thanks.
I am testing Allineas profiler, and our goal is to point out bad practice,
so I need to run all sorts of pathological cases. Now to see what our
software thinks

Thanks for your help

John

On 8 December 2014 at 15:57, Ralph Castain <r...@open-mpi.org> wrote:

> Thanks for sending that lstopo output - helped clarify things for me. I
> think I now understand the issue. Mostly a problem of my being rather dense
> when reading your earlier note.
>
> Try using —map-by node:PE=N to your cmd line. I think the problem is that
> we default to —map-by numa if you just give cpus-per-proc and no mapping
> directive as we know that having threads that span multiple numa regions is
> bad for performance
>
>
> > On Dec 5, 2014, at 9:07 AM, John Bray <jb...@allinea.com> wrote:
> >
> > Hi Ralph
> >
> > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way
> hyperthreading, so /proc/cpuinfo reports 24 cores
> >
> > Doing a pure compute OpenMP loop where I'd expect the number of
> iterations in 10s to rise with number of threads
> > with gnu and mpich
> > OMP_NUM_THREADS=1 -n 1 : 112 iterations
> > OMP_NUM_THREADS=2 -n 1 : 224 iterations
> > OMP_NUM_THREADS=6 -n 1 : 644 iterations
> > OMP_NUM_THREADS=12 -n 1 : 1287 iterations
> > OMP_NUM_THREADS=22 -n 1 : 1182 iterations
> > OMP_NUM_THREADS=24 -n 1 : 454 iterations
> >
> > which shows that mpich is spreading across the cores, but hyperthreading
> is not useful, and using the whole node counterproductive
> >
> > with gnu and openmpi 1.8.3
> > OMP_NUM_THREADS=1 mpiexec -n 1 : 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 : 113
> > which suggests you aren't allowing the threads to spread across cores
> >
> > adding --cpus-per-node I gain access to the resources on one chip
> >
> > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112
> > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224
> > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644
> > then
> > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  12
> >   number of cpus:  6
> >   map-by:          BYNUMA
> >
> > So you aren't happy using both chips for one process
> >
> > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus :
> 224
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus :
> 324
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :
> 631
> > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :
> 647
> >
> > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  24
> >   number of cpus:  12
> >   map-by:          BYNUMA
> >
> > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus :
> 112
> > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus :
> 224
> > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus ::
> 644
> >
> > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus
> :: 644
> >
> > A request for multiple cpus-per-proc was given, but a directive
> > was also give to map to an object level that has less cpus than
> > requested ones:
> >
> >   #cpus-per-proc:  24
> >   number of cpus:  12
> >   map-by:          BYNUMA
> >
> > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes
> from physical cores to hyperthreaded cores, but I can't get both chips
> working on the problem in way mpich can
> >
> > John
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25919.php
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/25927.php

Reply via email to