You can improve performance by using --bind-to socket or --bind-to numa as this will keep the process inside the same memory region. You can also help separate the jobs by using the --cpuset to tell each job which cpus it should use - we'll stay within that envelope.
On Tue, Aug 12, 2014 at 8:33 AM, Reuti <re...@staff.uni-marburg.de> wrote: > Am 12.08.2014 um 16:57 schrieb Antonio Rago: > > > Brilliant, this works! > > However I’ve to say that it seems that it seems that code becomes > slightly less performing. > > Is there a way to instruct mpirun on which core to use, and maybe create > this map automatically with grid engine? > > In the open source version of SGE the requested core binding is only a > soft request. The Univa version can handle this as a hard request though, > as the scheduler will do the assignment and knows which cores are used. I > have no information whether this will be forwarded to Open MPI > automatically. I assume not, and it must be read out of the machine file > (there ought to be an extra column for it in their version) and feed to > Open MPI by some measures. > > -- Reuti > > > > Thanks in advance > > Antonio > > > > > > > > > > On 12 Aug 2014, at 14:10, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > > >> The quick and dirty answer is that in the v1.8 series, Open MPI started > binding MPI processes to cores by default. > >> > >> When you run 2 independent jobs on the same machine in the way in which > you described, the two jobs won't have knowledge of each other, and > therefore they will both starting binging MPI processes starting with > logical core 0. > >> > >> The easy workaround is to disable bind-to-core behavior. For example, > "mpirun --bind-to none ...". In this way, the OS will (more or less) load > balance your MPI jobs to available cores (assuming you don't run more MPI > processes than cores). > >> > >> > >> On Aug 12, 2014, at 7:05 AM, Antonio Rago <antonio.r...@plymouth.ac.uk> > wrote: > >> > >>> Dear mailing list > >>> I’m running into trouble in the configuration of the small cluster I’m > managing. > >>> I’ve installed openmpi-1.8.1 with gcc 4.7 on a Centos 6.5 with > infiniband support. > >>> Compile and installation were all ok and i can compile and actually > run parallel jobs, both directly or by submitting them with the queue > manager (gridengine). > >>> My problem is that when two different subsets of two job end on the > same node, they will not spread equally and use all the cores of the node > but instead they will run on a common subset of cores leaving some other > totally empty. > >>> For example two 4 core jobs on a 8 core node will result in only 4 > core running on the node (all of them being oversubscribed) and the other 4 > cores being empty. > >>> Clearly there must be an error in the way I’ve configured stuffs but i > cannot find any hint on how to solve the problem. > >>> I’ve tried to do different map (map by core or by slot) but I’ve never > succeeded. > >>> Could you give a me suggestion on this issue? > >>> Regards > >>> Antonio > >>> > >>> ________________________________ > >>> [http://www.plymouth.ac.uk/images/email_footer.gif]< > http://www.plymouth.ac.uk/worldclass> > >>> > >>> This email and any files with it are confidential and intended solely > for the use of the recipient to whom it is addressed. If you are not the > intended recipient then copying, distribution or other use of the > information contained is strictly prohibited and you should not rely on it. > If you have received this email in error please let the sender know > immediately and delete it from your system(s). Internet emails are not > necessarily secure. While we take every care, Plymouth University accepts > no responsibility for viruses and it is your responsibility to scan emails > and their attachments. Plymouth University does not accept responsibility > for any changes made after it was sent. Nothing in this email or its > attachments constitutes an order for goods or services unless accompanied > by an official order form. > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24986.php > >> > >> > >> -- > >> Jeff Squyres > >> jsquy...@cisco.com > >> For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24991.php > > > > ________________________________ > > [http://www.plymouth.ac.uk/images/email_footer.gif]< > http://www.plymouth.ac.uk/worldclass> > > > > This email and any files with it are confidential and intended solely > for the use of the recipient to whom it is addressed. If you are not the > intended recipient then copying, distribution or other use of the > information contained is strictly prohibited and you should not rely on it. > If you have received this email in error please let the sender know > immediately and delete it from your system(s). Internet emails are not > necessarily secure. While we take every care, Plymouth University accepts > no responsibility for viruses and it is your responsibility to scan emails > and their attachments. Plymouth University does not accept responsibility > for any changes made after it was sent. Nothing in this email or its > attachments constitutes an order for goods or services unless accompanied > by an official order form. > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24992.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/24994.php >