Re: [OMPI users] new overcommitment warning?
On Fri, 5 Sep 2014, Ralph Castain wrote: On Sep 5, 2014, at 3:34 PM, Allin Cottrell wrote: I suspect there is a new (to openmpi 1.8.N?) warning with respect to requesting a number of MPI processes greater than the number of "real" cores on a given machine. I can provide a good deal more information is that's required, but can I just pose it as a question for now? Does anyone know of a a relevant change in the code? The reason I'm asking is that I've been experimenting, on a couple of machines and with more than one computational problem, to see if I'm better off restricting the number of MPI processes to the number of "real" or "physical" cores available, or if it's better to allow a larger number of processes up to the number of hyperthreads available (which is twice the number of cores on the machines I'm working on). If you are going to treat hyperthreads as independent processors, then you should probably set the --use-hwthreads-as-cpus flag so OMPI knows to treat it that way Hmm, where would I set that? (For reference) mpiexec --version gives mpiexec (OpenRTE) 1.8.2 and if I append --use-hwthreads-as-cpus to my mpiexec command I get mpiexec: Error: unknown option "--use-hwthreads-as-cpus" However, via trial and error I've found that these options work: either --map-by hwthread OR --oversubscribe (not mentioned in the mpiexec man page) What's puzzling me, though, is that the use of these flags was not necessary when, earlier this year, I was running ompi 1.6.5. Neither is it necessary when running ompi 1.7.3 on a different machine. The warning that's printed without these flags seems to be new. It seems to me that openmpi >= 1.8 is giving me a (somewhat obscure and non-user friendly) warning whenever I specify to mpiexec a number of processes > the number of "real" cores [...] Could you pass along the warning? It should only give you a warning if the #procs > #slots as you are then oversubscribed. You can turn that warning off by just add the oversubscribe flag to your mapping directive Here's what I'm seeing: A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:waverley #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. The machine in question has two cores and four threads. The thing that's confusing here is that I'm not aware of supplying any "binding directive": my command line (for running on a single host) is just this: mpiexec -np In fact it seems that current ompi "does the right thing" in respect of the division of labor even without the extra flags: depending on the nature of computation, I can get faster times with -np 4 than with -np 2 (and not degradation). It just insists on printing this warning which I'd like to be able to turn off "globally" if possible. Allin Cottrell
Re: [OMPI users] compilation problem with ifort
Hello Gus, Thanks once again for your help and guidance. I just want to let you know that I was able to resolve the problem; I actually changed the locations of the libraries before configuring; for example I set: BLAS_LIBS = -lmkl_intel_lp64 -lmkl_sequential -lmkl_coreLAPACK_LIBS = /opt/intel/Compiler/11.1/069/mkl/lib/em64t/libmkl_lapack95_lp64.aFFT_LIBS=-L/opt/scilibs/FFTW/fftw-3.2.1_Bull.9005/lib -lfftw3 and also to link espresso libs with the plugins, i added the following line: TOPDIR=/home_cluster/fis718/./espresso-4.0.3/ The compilation went with no errors and epw.x was produced.. Once again, thanks for your support and help. Regards ELIO MOUJAES > Date: Thu, 4 Sep 2014 12:48:44 -0400 > From: g...@ldeo.columbia.edu > To: us...@open-mpi.org > Subject: Re: [OMPI users] compilation problem with ifort > > Hi Elie > > The executable generated in my computer will be useless to you, > because these days most if not all libraries linked to an executable are > dynamic/shared libraries. > You won't have the same in your computer, or the equivalent will be > located in different places, may be from different versions, etc. > (E.g. your Intel compiler libraries will be from a different version, > in a different location, and likewise for OpenMPI libraries etc.) > Take any executable that you may have in your computer and do "ldd > exectuable_name" to see the list of shared libraries. > > The error you reported suggests a misconfiguration of Makefiles, > or better, a mispositioning of directories. > > ** > > First thing I would try is to start fresh. > Delete or move the old directory trees, > download everything again on blank directories, > and do the recipe all over again. > Leftovers of previous compilations are often a hurdle, > so you do yourself a favor by starting over from scratch. > > ** > Second *really important* item to check: > > The top directories of QE and EPW *must* follow this hierarchy: > > espresso-4.0.3 > |-- EPW-3.0.0 > > Is this what you have? > The EPW web site just hints this in their recipe step 3. > The Makefiles will NOT work if this directory hierarchy is incorrect. > > The error you reported in your first email *suggests* that the Makefiles > in the EPW tarball are not finding the Makefiles in the QE tarball, > which indicates that the the directories may not have a correct relative > location. > > I.e. the EPW top directory must be right under the QE top directory. > > ** > > Third thing, is that you have to follow the recipe strictly (and on > the EPW web site there seems to be typos and omissions): > > 1) untar the QE tarball: > > tar -zxf espresso-4.0.3.tar.gz > > 2) move the EPW tarball to the QE top directory produced by step 1 > above, something like this: > > mv EPW-3.0.0.tar.gz espresso-4.0.3 > > 3) untar the EPW tarball you copied/moved in step 2 above, > something like this: > > cd espresso-4.0.3 > tar -zxf EPW-3.0.0.tar.gz > > 4) Set up your OpenMPI environment (assuming you are using OpenMPI > and that it is not installed in a standard location such as /usr/local): > > > [bash/sh] > export PATH=/your/openmpi/bin:$PATH > export LD_LIBRARY_PATH=/your/openmpi/lib:$LD_LIBRARY_PATH > > [tcsh/csh] > setenv PATH /your/openmpi/bin:$PATH > setenv LD_LIBRARY_PATH /your/openmpi/lib:$LD_LIBRARY_PATH > > 5) configure espresso-4.0.3, i.e, assuming you already are in the > espresso-4.0.3, do: > > ./configure CC=icc F77=ifort > > (assuming you are using Intel compilers, and that you compiled OpenMPI > with them, if you did > not, say, if you used gcc and gfortran, use CC=gcc FC=gfortran instead) > > 6) Run "make" on the top EPW directory: > > cd EPW-3.0.0 > make > > When you configure QE it doesn't compile anything. > It just generates/sets up a bunch of Makefiles in the QE directory tree. > > When you do "make" on the EPW-3.0.0 directory the top Makefile just > says (cd src; make). > If you look into the "src" subdirectory you will see that the Makefile > therein points to library and include directories two levels above, > which means that they are in the *QE* directory tree: > > * > IFLAGS = -I../../include > MODFLAGS = -I./ -I../../Modules -I../../iotk/src \ > -I../../PW -I../../PH -I../../PP > LIBOBJS = ../../flib/ptools.a ../../flib/flib.a \ > ../../clib/clib.a ../../iotk/src/libiotk.a > W90LIB = ../../W90/libwannier.a > ** > > Hence, if your QE directory is not immediately above your EPW directory > everything will fail, because the EPW Makefile won't be able to find > the bits and parts of QE that it needs. > And this is *exactly what the error message in your first email showed*, > a bunch of object files that were not found. > > *** > > Sorry, but I cannot do any better than this. > I hope this helps, > Gus Correa > > On 09/03/2014 08:59 PM, Elio Physics wrote: > > Ray and Gus, > > > > Thanks a lot for your help. I followed Gus' steps. I still have the same > > proble
Re: [OMPI users] new overcommitment warning?
On Sep 6, 2014, at 7:52 AM, Allin Cottrell wrote: > On Fri, 5 Sep 2014, Ralph Castain wrote: > >> On Sep 5, 2014, at 3:34 PM, Allin Cottrell wrote: >> >>> I suspect there is a new (to openmpi 1.8.N?) warning with respect to >>> requesting a number of MPI processes greater than the number of "real" >>> cores on a given machine. I can provide a good deal more information is >>> that's required, but can I just pose it as a question for now? Does anyone >>> know of a a relevant change in the code? >>> >>> The reason I'm asking is that I've been experimenting, on a couple of >>> machines and with more than one computational problem, to see if I'm better >>> off restricting the number of MPI processes to the number of "real" or >>> "physical" cores available, or if it's better to allow a larger number of >>> processes up to the number of hyperthreads available (which is twice the >>> number of cores on the machines I'm working on). >> >> If you are going to treat hyperthreads as independent processors, then you >> should probably set the --use-hwthreads-as-cpus flag so OMPI knows to treat >> it that way > > Hmm, where would I set that? (For reference) mpiexec --version gives > > mpiexec (OpenRTE) 1.8.2 > > and if I append --use-hwthreads-as-cpus to my mpiexec command I get > > mpiexec: Error: unknown option "--use-hwthreads-as-cpus" > > However, via trial and error I've found that these options work: either > > --map-by hwthread OR > --oversubscribe (not mentioned in the mpiexec man page) My apologies - the correct spelling is --use-hwthread-cpus > > What's puzzling me, though, is that the use of these flags was not necessary > when, earlier this year, I was running ompi 1.6.5. Neither is it necessary > when running ompi 1.7.3 on a different machine. The warning that's printed > without these flags seems to be new. The binding code changed during the course of the 1.7 series to provide more fine-controlled options > >>> It seems to me that openmpi >= 1.8 is giving me a (somewhat obscure and >>> non-user friendly) warning whenever I specify to mpiexec a number of >>> processes > the number of "real" cores [...] >> >> Could you pass along the warning? It should only give you a warning if the >> #procs > #slots as you are then oversubscribed. You can turn that warning >> off by just add the oversubscribe flag to your mapping directive > > Here's what I'm seeing: > > > A request was made to bind to that would result in binding more > processes than cpus on a resource: > > Bind to: CORE > Node:waverley > #processes: 2 > #cpus: 1 > > You can override this protection by adding the "overload-allowed" > option to your binding directive. > > > The machine in question has two cores and four threads. The thing that's > confusing here is that I'm not aware of supplying any "binding directive": my > command line (for running on a single host) is just this: > > mpiexec -np > > In fact it seems that current ompi "does the right thing" in respect of the > division of labor even without the extra flags: depending on the nature of > computation, I can get faster times with -np 4 than with -np 2 (and not > degradation). It just insists on printing this warning which I'd like to be > able to turn off "globally" if possible. You shouldn't be getting that warning if you aren't specifying a binding option, so it looks like a bug to me. I'll check and see what's going on. You might want to check, however, that you don't have a binding directive hidden in your environment or default MCA param file. Meantime, just use the oversubscribe or overload-allowed options to turn it off. You can put those in the default MCA param file if you don't want to add it to the environment or cmd line. The MCA params would be: OMPI_MCA_rmaps_base_oversubscribe=1 If you want to bind the procs to cores, but allow two procs to share the core (each will be bound to both hyperthreads): OMPI_MCA_hwloc_base_binding_policy=core:overload If you want to bind the procs to the hyperthreads (since one proc will be bound to a hypterthread, no overloading will occur): OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 OMPI_MCA_hwloc_base_binding_policy=hwthread HTH Ralph > > Allin Cottrell > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25288.php
Re: [OMPI users] new overcommitment warning?
On Sat, 6 Sep 2014, Ralph Castain wrote: On Sep 6, 2014, at 7:52 AM, Allin Cottrell wrote: On Fri, 5 Sep 2014, Ralph Castain wrote: On Sep 5, 2014, at 3:34 PM, Allin Cottrell wrote: I suspect there is a new (to openmpi 1.8.N?) warning with respect to requesting a number of MPI processes greater than the number of "real" cores on a given machine. [...] If you are going to treat hyperthreads as independent processors, then you should probably set the --use-hwthreads-as-cpus flag so OMPI knows to treat it that way Hmm, where would I set that? (For reference) mpiexec --version gives mpiexec (OpenRTE) 1.8.2 and if I append --use-hwthreads-as-cpus to my mpiexec command I get mpiexec: Error: unknown option "--use-hwthreads-as-cpus" However, via trial and error I've found that these options work: either --map-by hwthread OR --oversubscribe (not mentioned in the mpiexec man page) My apologies - the correct spelling is --use-hwthread-cpus OK, thanks. What's puzzling me, though, is that the use of these flags was not necessary when, earlier this year, I was running ompi 1.6.5. Neither is it necessary when running ompi 1.7.3 on a different machine. The warning that's printed without these flags seems to be new. The binding code changed during the course of the 1.7 series to provide more fine-controlled options Again, thanks for the info. It seems to me that openmpi >= 1.8 is giving me a (somewhat obscure and non-user friendly) warning whenever I specify to mpiexec a number of processes > the number of "real" cores [...] Could you pass along the warning? It should only give you a warning if the #procs > #slots as you are then oversubscribed. You can turn that warning off by just add the oversubscribe flag to your mapping directive Here's what I'm seeing: A request was made to bind to that would result in binding more processes than cpus on a resource: Bind to: CORE Node:waverley #processes: 2 #cpus: 1 You can override this protection by adding the "overload-allowed" option to your binding directive. The machine in question has two cores and four threads. The thing that's confusing here is that I'm not aware of supplying any "binding directive": my command line (for running on a single host) is just this: mpiexec -np [...] You shouldn't be getting that warning if you aren't specifying a binding option, so it looks like a bug to me. I'll check and see what's going on. You might want to check, however, that you don't have a binding directive hidden in your environment or default MCA param file. I don't think that's the case: the only mca-params.conf file on my system is the default /etc/openmpi/openmpi-mca-params.conf installed by Arch, which is empty apart from comments, and "set | grep MCA" doesn't produce anything. Meantime, just use the oversubscribe or overload-allowed options to turn it off. You can put those in the default MCA param file if you don't want to add it to the environment or cmd line. The MCA params would be: OMPI_MCA_rmaps_base_oversubscribe=1 If you want to bind the procs to cores, but allow two procs to share the core (each will be bound to both hyperthreads): OMPI_MCA_hwloc_base_binding_policy=core:overload If you want to bind the procs to the hyperthreads (since one proc will be bound to a hypterthread, no overloading will occur): OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 OMPI_MCA_hwloc_base_binding_policy=hwthread Thanks, that's all very useful. One more question: how far back in ompi versions do the relevant mpiexec flags go? I ask because the (econometrics) program I work on has a facility for semi-automating use of MPI, which includes formulating a suitable mpiexec call on behalf of the user, and I'm wondering if --oversubscribe and/or --use-hwthread-cpus will "just work", or might choke earlier versions of mpiexec. Allin Cottrell
Re: [OMPI users] new overcommitment warning?
On Sep 6, 2014, at 11:00 AM, Allin Cottrell wrote: > On Sat, 6 Sep 2014, Ralph Castain wrote: > >> On Sep 6, 2014, at 7:52 AM, Allin Cottrell wrote: >> >>> On Fri, 5 Sep 2014, Ralph Castain wrote: >>> On Sep 5, 2014, at 3:34 PM, Allin Cottrell wrote: > I suspect there is a new (to openmpi 1.8.N?) warning with respect to > requesting a number of MPI processes greater than the number of "real" > cores on a given machine. [...] If you are going to treat hyperthreads as independent processors, then you should probably set the --use-hwthreads-as-cpus flag so OMPI knows to treat it that way >>> >>> Hmm, where would I set that? (For reference) mpiexec --version gives >>> >>> mpiexec (OpenRTE) 1.8.2 >>> >>> and if I append --use-hwthreads-as-cpus to my mpiexec command I get >>> >>> mpiexec: Error: unknown option "--use-hwthreads-as-cpus" >>> >>> However, via trial and error I've found that these options work: either >>> >>> --map-by hwthread OR >>> --oversubscribe (not mentioned in the mpiexec man page) >> >> My apologies - the correct spelling is --use-hwthread-cpus > > OK, thanks. > >>> What's puzzling me, though, is that the use of these flags was not >>> necessary when, earlier this year, I was running ompi 1.6.5. Neither is it >>> necessary when running ompi 1.7.3 on a different machine. The warning >>> that's printed without these flags seems to be new. >> >> The binding code changed during the course of the 1.7 series to provide more >> fine-controlled options > > Again, thanks for the info. > > It seems to me that openmpi >= 1.8 is giving me a (somewhat obscure and > non-user friendly) warning whenever I specify to mpiexec a number of > processes > the number of "real" cores [...] Could you pass along the warning? It should only give you a warning if the #procs > #slots as you are then oversubscribed. You can turn that warning off by just add the oversubscribe flag to your mapping directive >>> >>> Here's what I'm seeing: >>> >>> >>> A request was made to bind to that would result in binding more >>> processes than cpus on a resource: >>> >>> Bind to: CORE >>> Node:waverley >>> #processes: 2 >>> #cpus: 1 >>> >>> You can override this protection by adding the "overload-allowed" >>> option to your binding directive. >>> >>> >>> The machine in question has two cores and four threads. The thing that's >>> confusing here is that I'm not aware of supplying any "binding directive": >>> my command line (for running on a single host) is just this: >>> >>> mpiexec -np >>> >>> [...] >> >> You shouldn't be getting that warning if you aren't specifying a binding >> option, so it looks like a bug to me. I'll check and see what's going on. >> You might want to check, however, that you don't have a binding directive >> hidden in your environment or default MCA param file. > > I don't think that's the case: the only mca-params.conf file on my system is > the default /etc/openmpi/openmpi-mca-params.conf installed by Arch, which is > empty apart from comments, and "set | grep MCA" doesn't produce anything. Okay - I've replicated the bug here, so I'll address it for 1.8.3. Thanks for letting me know about it! > >> Meantime, just use the oversubscribe or overload-allowed options to turn it >> off. You can put those in the default MCA param file if you don't want to >> add it to the environment or cmd line. The MCA params would be: >> >> OMPI_MCA_rmaps_base_oversubscribe=1 >> >> If you want to bind the procs to cores, but allow two procs to share the >> core (each will be bound to both hyperthreads): >> OMPI_MCA_hwloc_base_binding_policy=core:overload >> >> If you want to bind the procs to the hyperthreads (since one proc will be >> bound to a hypterthread, no overloading will occur): >> OMPI_MCA_hwloc_base_use_hwthreads_as_cpus=1 >> OMPI_MCA_hwloc_base_binding_policy=hwthread > > Thanks, that's all very useful. One more question: how far back in ompi > versions do the relevant mpiexec flags go? > > I ask because the (econometrics) program I work on has a facility for > semi-automating use of MPI, which includes formulating a suitable mpiexec > call on behalf of the user, and I'm wondering if --oversubscribe and/or > --use-hwthread-cpus will "just work", or might choke earlier versions of > mpiexec. At least 1.7.4 for the hwthread-cpus - maybe a little further back then that, but definitely not into the 1.6 series. The -oversubscribe flag goes all the way back to the beginning release. > > Allin Cottrell > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/09/25291.php