Hyperthreading is pretty great for non-HPC applications, which is why Intel makes it. But hyperthreading *generally* does not help HPC application performance. You're basically halving several on-chip resources / queues / pipelines, and that can hurt for performance-hungry HPC applications.
This is a per-application issue, of course, so YMMV. But the general wisdom -- even with Intel Ivy Bridge-class chips -- is to disable hperthreading for HPC apps. That being said, Open MPI started supporting hyperthreading properly somewhere in the 1.5/1.6 series (I don't remember the exact version). These are among the reasons that we're urging you to upgrade to at least 1.6.5. "Supporting hyperthreading properly" means: when you say "bind to core", OMPI will recognize that each core is composed of N hyperthreads, and will bind to all of them (vs. bind each MPI process to a Linux virtual processor, which may be a core or a hyperthread). So if you're running in a bind-to-core situation, if it's a "before OMPI supporter HT properly" version, then you'll bind 2 MPI processes to a single core, and that will likely be pretty terrible for overall performance. Does that help? On Jul 22, 2014, at 5:18 PM, Lane, William <william.l...@cshs.org> wrote: > Ralph, > > The 32 slot systems/nodes I'm running my openMPI test code on only have > 16 physical cores, the rest of the slots are hyperthreads. I've done some more > testing and noticed that if I limit the number of slots per node to 8 > (via -npernode 8) everything works and 8 slots are used from each system/node: > > mpirun -np 32 -npernode 8 --prefix /usr/lib64/openmpi --hostfile hostfile > --mca btl_tcp_if_include eth0 --mca > pls_gridengine_verbose 1 /hpc/home/lanew/mpi/openmpi/ProcessColors2 > > However, when I run this same testcode on an older cluster (much older version > of openMPI [1.3.3]) I have no problems using all the cores (including > hyperthreading > cores). The Intel CPU's used are different in each case: the older cluster > uses > two, 6 core Xeons with 12 hyperthreads while the new cluster uses two, > 8 core Sandybridges with 16 hyperthreads apiece. > > Is hyperthreading an issue with openMPI? Should hyperthreading always be > turned > off for openMPI apps? > > Thanks for you time, > > -Bill Lane > > > From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: Tuesday, July 22, 2014 7:57 AM > To: Open MPI Users > Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots > > Hmmm...that's not a "bug", but just a packaging issue with the way CentOS > distributed some variants of OMPI that requires you install/update things in > a specific order. > > On Jul 20, 2014, at 11:34 PM, Lane, William <william.l...@cshs.org> wrote: > >> Please see: >> >> http://bugs.centos.org/view.php?id=5812 >> >> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain >> [r...@open-mpi.org] >> Sent: Sunday, July 20, 2014 9:30 AM >> To: Open MPI Users >> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots >> >> I'm unaware of any CentOS-OMPI bug, and I've been using CentOS throughout >> the 6.x series running OMPI 1.6.x and above. >> >> I can't speak to the older versions of CentOS and/or the older versions of >> OMPI. >> >> On Jul 19, 2014, at 8:14 PM, Lane, William <william.l...@cshs.org> wrote: >> >>> Yes there is a second HPC Sun Grid Engine cluster on which I've run >>> this openMPI test code dozens of times on upwards of 400 slots >>> through SGE using qsub and qrsh, but this was using a much >>> older version of openMPI (1.3.3 I believe). On that particular cluster the >>> open files hard and soft limits were an issue. >>> >>> I have noticed that there has been a new (as of July 2014) CentOS openMPI >>> bug that >>> occurs when CentOS is upgraded from 6.2 to 6.3. I'm not sure if that >>> bug applies to this situation though. >>> >>> This particular problem occurs whether or not I submit jobs through SGE >>> (via qrsh >>> or qsub) or outside of SGE which leads me to believe it is an openMPI >>> and/or CentOS >>> issue. >>> >>> -Bill Lane >>> >>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain >>> [r...@open-mpi.org] >>> Sent: Saturday, July 19, 2014 3:21 PM >>> To: Open MPI Users >>> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots >>> >>> Not for this test case size. You should be just fine with the default >>> values. >>> >>> If I understand you correctly, you've run this app at scale before on >>> another cluster without problem? >>> >>> On Jul 19, 2014, at 1:34 PM, Lane, William <william.l...@cshs.org> wrote: >>> >>>> Ralph, >>>> >>>> It's hard to imagine it's the openMPI code because I've tested this code >>>> extensively on another cluster with 400 nodes and never had any problems. >>>> But I'll try using the hello_c example in any case. Is it still >>>> recommended to >>>> raise the open files soft and hard limits to 4096? Or should even larger >>>> values >>>> be necessary? >>>> >>>> Thank you for your help. >>>> >>>> -Bill Lane >>>> >>>> From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain >>>> [r...@open-mpi.org] >>>> Sent: Saturday, July 19, 2014 8:07 AM >>>> To: Open MPI Users >>>> Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots >>>> >>>> That's a pretty old OMPI version, and we don't really support it any >>>> longer. However, I can provide some advice: >>>> >>>> * have you tried running the simple "hello_c" example we provide? This >>>> would at least tell you if the problem is in your app, which is what I'd >>>> expect given your description >>>> >>>> * try using gdb (or pick your debugger) to look at the corefile and see >>>> where it is failing >>>> >>>> I'd also suggest updating OMPI to the 1.6.5 or 1.8.1 versions, but I doubt >>>> that's the issue behind this problem. >>>> >>>> >>>> On Jul 19, 2014, at 1:05 AM, Lane, William <william.l...@cshs.org> wrote: >>>> >>>>> I'm getting consistent errors of the form: >>>>> >>>>> "mpirun noticed that process rank 3 with PID 802 on node csclprd3-0-8 >>>>> exited on signal 11 (Segmentation fault)." >>>>> >>>>> whenever I request more than 28 slots. These >>>>> errors even occur when I run mpirun locally >>>>> on a compute node that has 32 slots (8 cores, 16 with hyperthreading). >>>>> >>>>> When I run less than 28 slots I have no problems whatsoever. >>>>> >>>>> OS: >>>>> CentOS release 6.3 (Final) >>>>> >>>>> openMPI information: >>>>> Package: Open MPI mockbu...@c6b8.bsys.dev.centos.org >>>>> Distribution >>>>> Open MPI: 1.5.4 >>>>> Open MPI SVN revision: r25060 >>>>> Open MPI release date: Aug 18, 2011 >>>>> Open RTE: 1.5.4 >>>>> Open RTE SVN revision: r25060 >>>>> Open RTE release date: Aug 18, 2011 >>>>> OPAL: 1.5.4 >>>>> OPAL SVN revision: r25060 >>>>> OPAL release date: Aug 18, 2011 >>>>> Ident string: 1.5.4 >>>>> Prefix: /usr/lib64/openmpi >>>>> Configured architecture: x86_64-unknown-linux-gnu >>>>> Configure host: c6b8.bsys.dev.centos.org >>>>> Configured by: mockbuild >>>>> Configured on: Fri Jun 22 06:42:03 UTC 2012 >>>>> Configure host: c6b8.bsys.dev.centos.org >>>>> Built by: mockbuild >>>>> Built on: Fri Jun 22 06:46:48 UTC 2012 >>>>> Built host: c6b8.bsys.dev.centos.org >>>>> C bindings: yes >>>>> C++ bindings: yes >>>>> Fortran77 bindings: yes (all) >>>>> Fortran90 bindings: yes >>>>> Fortran90 bindings size: small >>>>> C compiler: gcc >>>>> C compiler absolute: /usr/bin/gcc >>>>> C compiler family name: GNU >>>>> C compiler version: 4.4.6 >>>>> C++ compiler: g++ >>>>> C++ compiler absolute: /usr/bin/g++ >>>>> Fortran77 compiler: gfortran >>>>> Fortran77 compiler abs: /usr/bin/gfortran >>>>> Fortran90 compiler: gfortran >>>>> Fortran90 compiler abs: /usr/bin/gfortran >>>>> C profiling: yes >>>>> C++ profiling: yes >>>>> Fortran77 profiling: yes >>>>> Fortran90 profiling: yes >>>>> C++ exceptions: no >>>>> Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) >>>>> Sparse Groups: no >>>>> Internal debug support: no >>>>> MPI interface warnings: no >>>>> MPI parameter check: runtime >>>>> Memory profiling support: no >>>>> Memory debugging support: no >>>>> libltdl support: yes >>>>> Heterogeneous support: no >>>>> mpirun default --prefix: no >>>>> MPI I/O support: yes >>>>> MPI_WTIME support: gettimeofday >>>>> Symbol vis. support: yes >>>>> MPI extensions: affinity example >>>>> FT Checkpoint support: no (checkpoint thread: no) >>>>> MPI_MAX_PROCESSOR_NAME: 256 >>>>> MPI_MAX_ERROR_STRING: 256 >>>>> MPI_MAX_OBJECT_NAME: 64 >>>>> MPI_MAX_INFO_KEY: 36 >>>>> MPI_MAX_INFO_VAL: 256 >>>>> MPI_MAX_PORT_NAME: 1024 >>>>> MPI_MAX_DATAREP_STRING: 128 >>>>> MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA memory: linux (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA paffinity: hwloc (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA carto: auto_detect (MCA v2.0, API v2.0, Component >>>>> v1.5.4) >>>>> MCA carto: file (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA timer: linux (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA installdirs: config (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: basic (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: inter (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: self (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: sm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: sync (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA mpool: sm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA pml: bfo (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA pml: csum (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA pml: v (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA bml: r2 (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA btl: openib (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA btl: self (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA btl: sm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA topo: unity (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA iof: orted (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA iof: tool (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA odls: default (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ras: cm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ras: gridengine (MCA v2.0, API v2.0, Component >>>>> v1.5.4) >>>>> MCA ras: loadleveler (MCA v2.0, API v2.0, Component >>>>> v1.5.4) >>>>> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rmaps: load_balance (MCA v2.0, API v2.0, Component >>>>> v1.5.4) >>>>> MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rmaps: resilient (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component >>>>> v1.5.4) >>>>> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rmaps: topo (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA rml: oob (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: binomial (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: cm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: direct (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: linear (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: radix (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA routed: slave (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA plm: rshd (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA errmgr: default (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: env (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: singleton (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: slave (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: slurmd (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA ess: tool (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA grpcomm: hier (MCA v2.0, API v2.0, Component v1.5.4) >>>>> MCA notifier: command (MCA v2.0, API v1.0, Component v1.5.4) >>>>> MCA notifier: smtp (MCA v2.0, API v1.0, Component v1.5.4) >>>>> MCA notifier: syslog (MCA v2.0, API v1.0, Component v1.5.4) >>>>> >>>>> IMPORTANT WARNING: This message is intended for the use of the person or >>>>> entity to which it is addressed and may contain information that is >>>>> privileged and confidential, the disclosure of which is governed by >>>>> applicable law. If the reader of this message is not the intended >>>>> recipient, or the employee or agent responsible for delivering it to the >>>>> intended recipient, you are hereby notified that any dissemination, >>>>> distribution or copying of this information is STRICTLY PROHIBITED. If >>>>> you have received this message in error, please notify us immediately by >>>>> calling (310) 423-6428 and destroy the related message. Thank You for >>>>> your cooperation. _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2014/07/24815.php >>>> >>>> IMPORTANT WARNING: This message is intended for the use of the person or >>>> entity to which it is addressed and may contain information that is >>>> privileged and confidential, the disclosure of which is governed by >>>> applicable law. If the reader of this message is not the intended >>>> recipient, or the employee or agent responsible for delivering it to the >>>> intended recipient, you are hereby notified that any dissemination, >>>> distribution or copying of this information is STRICTLY PROHIBITED. If you >>>> have received this message in error, please notify us immediately by >>>> calling (310) 423-6428 and destroy the related message. Thank You for your >>>> cooperation. _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/07/24817.php >>> >>> IMPORTANT WARNING: This message is intended for the use of the person or >>> entity to which it is addressed and may contain information that is >>> privileged and confidential, the disclosure of which is governed by >>> applicable law. If the reader of this message is not the intended >>> recipient, or the employee or agent responsible for delivering it to the >>> intended recipient, you are hereby notified that any dissemination, >>> distribution or copying of this information is STRICTLY PROHIBITED. If you >>> have received this message in error, please notify us immediately by >>> calling (310) 423-6428 and destroy the related message. Thank You for your >>> cooperation. _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/07/24819.php >> >> IMPORTANT WARNING: This message is intended for the use of the person or >> entity to which it is addressed and may contain information that is >> privileged and confidential, the disclosure of which is governed by >> applicable law. If the reader of this message is not the intended recipient, >> or the employee or agent responsible for delivering it to the intended >> recipient, you are hereby notified that any dissemination, distribution or >> copying of this information is STRICTLY PROHIBITED. If you have received >> this message in error, please notify us immediately by calling (310) >> 423-6428 and destroy the related message. Thank You for your cooperation. >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/07/24832.php > > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivering it to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this information is STRICTLY PROHIBITED. If you have received this > message in error, please notify us immediately by calling (310) 423-6428 and > destroy the related message. Thank You for your cooperation. > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/07/24852.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/