OMP_NUM_THREADS=1 mpiexec -n 1 gnu_openmpi_a/one_c_prof.exe : 113 iterations OMP_NUM_THREADS=6 mpiexec -n 1 --map-by node:PE=6 : 639 iterations OMP_NUM_THREADS=6 mpiexec -n 2 --map-by node:PE=6 : 639 iterations OMP_NUM_THREADS=12 mpiexec -n 1 --map-by node:PE=12 : 1000 iterations OMP_NUM_THREADS=12 mpiexec -n 2 --use-hwthread-cpus --map-by node:PE=12 : 646 iterations
that's looking better, with limited gain for 1 process on 2 chips. Thanks. I am testing Allineas profiler, and our goal is to point out bad practice, so I need to run all sorts of pathological cases. Now to see what our software thinks Thanks for your help John On 8 December 2014 at 15:57, Ralph Castain <r...@open-mpi.org> wrote: > Thanks for sending that lstopo output - helped clarify things for me. I > think I now understand the issue. Mostly a problem of my being rather dense > when reading your earlier note. > > Try using —map-by node:PE=N to your cmd line. I think the problem is that > we default to —map-by numa if you just give cpus-per-proc and no mapping > directive as we know that having threads that span multiple numa regions is > bad for performance > > > > On Dec 5, 2014, at 9:07 AM, John Bray <jb...@allinea.com> wrote: > > > > Hi Ralph > > > > I have a motherboard with 2 X6580 chips, each with 6 cores 2 way > hyperthreading, so /proc/cpuinfo reports 24 cores > > > > Doing a pure compute OpenMP loop where I'd expect the number of > iterations in 10s to rise with number of threads > > with gnu and mpich > > OMP_NUM_THREADS=1 -n 1 : 112 iterations > > OMP_NUM_THREADS=2 -n 1 : 224 iterations > > OMP_NUM_THREADS=6 -n 1 : 644 iterations > > OMP_NUM_THREADS=12 -n 1 : 1287 iterations > > OMP_NUM_THREADS=22 -n 1 : 1182 iterations > > OMP_NUM_THREADS=24 -n 1 : 454 iterations > > > > which shows that mpich is spreading across the cores, but hyperthreading > is not useful, and using the whole node counterproductive > > > > with gnu and openmpi 1.8.3 > > OMP_NUM_THREADS=1 mpiexec -n 1 : 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 : 113 > > which suggests you aren't allowing the threads to spread across cores > > > > adding --cpus-per-node I gain access to the resources on one chip > > > > OMP_NUM_THREADS=1 mpiexec --cpus-per-proc 1 -n 1 : 112 > > OMP_NUM_THREADS=2 mpiexec --cpus-per-proc 2 -n 1 : 224 > > OMP_NUM_THREADS=6 mpiexec --cpus-per-proc 2 -n 1 : 644 > > then > > OMP_NUM_THREADS=12 mpiexec --cpus-per-proc 12 -n 1 > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 12 > > number of cpus: 6 > > map-by: BYNUMA > > > > So you aren't happy using both chips for one process > > > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 1 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=4 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : > 224 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 6 --use-hwthread-cpus : > 324 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : > 631 > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus : > 647 > > > > OMP_NUM_THREADS=24 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 24 > > number of cpus: 12 > > map-by: BYNUMA > > > > OMP_NUM_THREADS=1 mpiexec -n 1 --cpus-per-proc 2 --use-hwthread-cpus : > 112 > > OMP_NUM_THREADS=2 mpiexec -n 1 --cpus-per-proc 4 --use-hwthread-cpus : > 224 > > OMP_NUM_THREADS=6 mpiexec -n 1 --cpus-per-proc 12 --use-hwthread-cpus :: > 644 > > > > OMP_NUM_THREADS=12 mpiexec -n 1 --cpus-per-proc 24 --use-hwthread-cpus > :: 644 > > > > A request for multiple cpus-per-proc was given, but a directive > > was also give to map to an object level that has less cpus than > > requested ones: > > > > #cpus-per-proc: 24 > > number of cpus: 12 > > map-by: BYNUMA > > > > So it seems that --use-hwthread-cpus means that --cpus-per-proc changes > from physical cores to hyperthreaded cores, but I can't get both chips > working on the problem in way mpich can > > > > John > > > > > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25919.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25927.php