Re: [OMPI users] orte_ess_base_select failed
--Original Message-- From: Gustavo Correa Sender: users-boun...@open-mpi.org To: Open MPI Users ReplyTo: Open MPI Users Sent: Dec 7, 2011 1:10 PM Subject: Re: [OMPI users] orte_ess_base_select failed Hi John Doe I would keep it very simple, particularly if you are just starting with MPI or OpenMPI. Why not this? ./configure --prefix=/opt/ompi/gnu/1.4.4 You may also point to the compilers CC, CXX, F77. FC, for C, C++, Fortran-77 and Fortran-90, respectively , in case they are not in standard location. Do a 'make distclean' before you start, to cleanup any old mess. Once you get it working, you can add other flags. However, I would add only those that you may really need. I hope this helps, Gary Cooper On Dec 7, 2011, at 12:19 PM, John Doe wrote: > Hi Ralph, > > I may have been a little promiscous in my use of build flags. My initial > configure line was much simpler then I kept throwing in flags when it > wouldn't run. I'll try to build it again with the your config line and see if > that resolves the issue. > > Sam > > On Wed, Dec 7, 2011 at 11:11 AM, Ralph Castain wrote: > I don't understand your configure line - why did you give an argument to > enable-shared?? That option doesn't take an argument, and may be causing the > confusion. Also, enable-debug by default turns off optimization as otherwise > the optimizer removes all debug symbols. > > If you want a debug version, try just this: > > ./configure --prefix=/opt/ompi/gnu/1.4.4 --enable-debug > --with-valgrind=/opt/valgrind --enable-orterun-prefix-by-default > --enable-memchecker --enable-mem-profile > > You don't need --with-devel-headers unless you intend to write code that > directly drives the OMPI internals. > > > On Dec 7, 2011, at 10:00 AM, John Doe wrote: > >> Hi Gustavo, >> >> I do have /opt/ompi/gnu/1.4.4/lib in my LD_LIBRARY_PATH and the bin >> directory in my path as well but that didn't seem to help. >> >> Sam >> >> On Tue, Dec 6, 2011 at 5:18 PM, Gustavo Correa >> wrote: >> Hi John Doe >> >> What you need to add to LD_LIBRARY_PATH is /opt/ompi/gnu/1.4.4/lib >> [note 'lib' at the end]. >> Your email seems to say that you added /opt/ompi/gnu/1.4.4/lib/openmpi >> instead, if I understood it right. >> And to your PATH you need to add the corresponding 'bin' directory: >> /opt/ompi/gnu/1.4.4/bin. >> The rule here is your installation prefix /opt/ompi/gnu/1.4.4/ >> with 'lib' or 'bin' at the end. >> >> I hope this helps, >> Frank Capra >> >> On Dec 6, 2011, at 5:54 PM, John Doe wrote: >> >> > I recently built and installed openmpi on my 64 bit linux machine running >> > centOS 6. >> > However whenever I try mpirun I get the error message: >> > >> > [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c >> > at line 125 >> > orte_ess_base_select failed >> > >> > Actually here's the full error transcript: >> > >> > >> mpiexec -n 4 object/a.out >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_paffinity_linux: file not found >> > (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_carto_auto_detect: file not found >> > (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_carto_file: file not found (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_ess_env: file not found (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_ess_hnp: file not found (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_ess_singleton: file not found (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_ess_slurm: file not found (ignored) >> > [ellipse:01480] mca: base: component_find: unable to open >> > /opt/ompi/gnu/1.4.4/lib/openmpi/mca_ess_tool: file not found (ignored) >> > [ellipse:01480] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file >> > runtime/orte_init.c at line 125 >> Sent via BlackBerry by AT&T
[OMPI users] speed up this problem by MPI
Hi, (1). I am wondering how I can speed up the time-consuming computation in the loop of my code below using MPI? int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } // some operations using all elements in array delete [] array; } As shown in the code, I want to do some operations before and after the part to be paralleled with MPI, but I don't know how to specify where the parallel part begins and ends. (2) My current code is using OpenMP to speed up the comutation. void f(int size) { // some operations int i; double * array = new double [size]; omp_set_num_threads(_nb_threads); #pragma omp parallel shared(array) private(i) { #pragma omp for schedule(dynamic) nowait for (i = 0; i < size; i++) // how can I use MPI to speed up this loop to compute all elements in the array? { array[i] = complicated_computation(); // time comsuming computation } } // some operations using all elements in array } I wonder if I change to use MPI, is it possible to have the code written both for OpenMP and MPI? If it is possible, how to write the code and how to compile and run the code? Thanks and regards!
Re: [OMPI users] speed up this problem by MPI
Thanks, Eugene. I admit I am not that smart to understand well how to use MPI, but I did read some basic materials about it and understand how some simple problems are solved by MPI. But dealing with an array in my case, I am not certain about how to apply MPI to it. Are you saying to use send and recieve to transfer the value computed for each element from child process to parent process? Do you allocate a copy of the array for each process? Also I only need the loop that computes every element of the array to be parallelized. Someone said that the parallel part begins with MPI_Init and ends with MPI_Finilize, and one can do any serial computations before and/or after these calls. But I have wrote some MPI programs, and found that the parallel part is not restricted between MPI_Init and MPI_Finilize, but instead the whole program. If the rest part of the code has to be wrapped for process with ID 0, I have little idea about how to apply that to my case since the rest part would be the parts before and after the loop in the function and the whole in main(). If someone could give a sample of how to apply MPI in my case, it will clarify a lot of my questions. Usually I can learn a lot from good examples. Thanks! --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 7:30 PM > Take a look at some introductory MPI > materials to learn how to use MPI and what it's about. > There should be resources on-line... take a look around. > > The main idea is that you would have many processes, each > process would have part of the array. Thereafter, if a > process needs data or results from any other process, such > data would have to be exchanged between the processes > explicitly. > > Many codes have both OpenMP and MPI parallelization, but > you should first familiarize yourself with the basics of MPI > before dealing with "hybrid" codes. > > Tim wrote: > > > Hi, > > > > (1). I am wondering how I can speed up the > time-consuming computation in the loop of my code below > using MPI? > > int main(int argc, char > ** argv) { > // some operations > f(size); > // some > operations > return 0; > } > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > for (i = 0; i < size; i++) // how can I > use MPI to speed up this loop to compute all elements in the > array? { > array[i] = complicated_computation(); // > time comsuming computation > } > // some operations using all elements in > array > delete [] array; } > > > > As shown in the code, I want to do some operations > before and after the part to be paralleled with MPI, but I > don't know how to specify where the parallel part begins and > ends. > > > > (2) My current code is using OpenMP to speed up the > comutation. > > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > omp_set_num_threads(_nb_threads); > #pragma omp parallel shared(array) > private(i) { > > #pragma omp for > schedule(dynamic) nowait > for (i = 0; i < size; i++) // how can I use > MPI to speed up this loop to compute all elements in the > array? { > array[i] = complicated_computation(); // > time comsuming computation > } > } // some operations using > all elements in array > } > > > > I wonder if I change to use MPI, is it possible to > have the code written both for OpenMP and MPI? If it is > possible, how to write the code and how to compile and run > the code? > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Thanks Eugene! My case, after simplified, is to speed up the time-consuming computation in the loop below by assigning iterations to several nodes in a cluster by MPI. Each iteration of the loop computes each element of an array. The computation of each element is independent of others in the array. int main(int argc, char ** argv) { // some operations f(size); // some operations return 0; } void f(int size) { // some operations int i; double * array = new double [size]; for (i = 0; i < size; i++) // need to speed up by MPI. { array[i] = complicated_computation(); // time consuming } // some operations using all elements in array delete [] array; } --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 8:31 PM > Tim wrote: > > > Thanks, Eugene. > > > > I admit I am not that smart to understand well how to > use MPI, but I did read some basic materials about it and > understand how some simple problems are solved by MPI. > > But dealing with an array in my case, I am not certain > about how to apply MPI to it. Are you saying to use send and > recieve to transfer the value computed for each element from > child process to parent process? > > > You can, but typically that would entail too much > communication overhead for each element. > > > Do you allocate a copy of the array for each process? > > > You can, but typically that would entail excessive memory > consumption. > > Typically, one allocates only a portion of the array on > each process. E.g., if the array has 10,000 elements > and you have four processes, the first gets the first 2,500 > elements, the second the next 2,500, and so on. > > > Also I only need the loop that computes every element > of the array to be parallelized. > > > If you only need the initial computation of array elements > to be parallelized, perhaps any of the above strategies > could work. It depends on how expensive the > computation of each element is. > > > Someone said that the parallel part begins with > MPI_Init and ends with MPI_Finilize, > > > Well, usually all processes are launched in parallel. > So, the parallel begins "immediately." Inter-process > communications using MPI, however, must take place between > the MPI_Init and MPI_Finalize calls. > > > and one can do any serial computations before and/or > after these calls. But I have wrote some MPI programs, and > found that the parallel part is not restricted between > MPI_Init and MPI_Finilize, but instead the whole program. If > the rest part of the code has to be wrapped for process with > ID 0, I have little idea about how to apply that to my case > since the rest part would be the parts before and after the > loop in the function and the whole in main(). > > > I don't understand your case very clearly. I will > take a guess. You could have all processes start and > call MPI_Init. Then, slave processes can go to sleep, > waking occasionally to check if the master has sent a signal > to begin computation. The master does what it has to > do and then sends wake signals. Each slave computes > its portion and sends that portion back to the master. > Each slave exits. The master gathers all the pieces > and resumes its computation. Does that sound right? > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Sorry, complicated_computation() and f() are simplified too much. They do take more inputs. Among the inputs to complicated_computation(), some is passed from the main() to f() by address since it is a big array, some is passed by value, some are created inside f() before the call to complicated_computation(). so actually (although not exactly) the code is like: int main(int argc, char ** argv) { int size; double *feature = new double[1000]; // compute values of elements of "feature" // some operations f(size, feature); // some operations delete [] feature; return 0; } void f(int size, double *feature) { vector coeff; // read from a file into elements of coeff MyClass myobj; double * array = new double [coeff.size()]; for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI. { array[i] = myobj.complicated_computation(size, coeff[i], feature); // time consuming } // some operations using all elements in array delete [] array; } --- On Thu, 1/28/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Thursday, January 28, 2010, 11:40 PM > Tim wrote: > > > Thanks Eugene! > > > > My case, after simplified, is to speed up the > time-consuming computation in the loop below by assigning > iterations to several nodes in a cluster by MPI. Each > iteration of the loop computes each element of an array. The > computation of each element is independent of others in the > array. > > int main(int argc, char > ** argv) { > // some operations > f(size); > // some > operations > return 0; > } > void f(int size) > { // some > operations > int i; > double * array = new double > [size]; > for (i = 0; i < size; i++) // need to > speed up by MPI. > > { > array[i] = complicated_computation(); // > time consuming > What are the inputs to complicated_computation()? > Does each process know what the inputs are? Or, do > they need to come from the master process? Are there > many inputs? > > > } > // some operations using all > elements in array > delete [] array; > } > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Thanks! How to send/recieve and broadcast objects of self-defined class and of std::vector? How to deal with serialization problems? BTW: I would like to find some official documentation of OpenMP, but there seems none? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 12:50 AM > Tim wrote: > > > Sorry, complicated_computation() and f() are > simplified too much. They do take more inputs. > > Among the inputs to complicated_computation(), some is > passed from the main() to f() by address since it is a big > array, some is passed by value, some are created inside f() > before the call to complicated_computation(). > > so actually (although not exactly) the code is like: > > > I think I'm agreeing with Terry. But, to add more > detail: > > > int main(int argc, char ** > argv) { > int size; > > double *feature = new > double[1000]; > > // compute values of elements > of "feature" > > // some operations > > > The array "feature" can be computed by the master and then > broadcast, or it could be computed redundantly by each > process. > > > f(size, feature); > // some > operations delete [] feature; > return 0; > } > void f(int size, double *feature) > { > vector coeff; > // read from a file into > elements of coeff > > > Similarly, coeff can be read in by the master and then > broadcast, or it could be read redundantly by each process, > or each process could read only the portion that it will > need. > > > > > MyClass myobj; > > double * array = new > double [coeff.size()]; > for (int i = 0; i < > coeff.size(); i++) // need to speed up by MPI. > > { > array[i] = myobj.complicated_computation(size, > coeff[i], feature); // time consuming > } > > > Each process loops only over the iterations that correspond > to its rank. Then, the master gathers all results. > > > // some operations using all > elements in array > delete [] array; > > } > > > Once the slaves have finished their computations and sent > their results to the master, they may exit. The slaves > will be launched at the same time as the master, but > presumably have less to do than the master does before the > "parallel loop" starts. If you don't want slaves > consuming excessive CPU time while they wait for the master, > fix that problem later once you have the basic code > working. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
Sorry, my typo. I meant to say OpenMPI documentation. How to send/recieve and broadcast objects of self-defined class and of std::vector? If using MPI_Type_struct, the setup becomes complicated if the class has various types of data members, and a data member of another class. How to deal with serialization problems? Are there some good reference for these problems? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 10:39 AM > Tim wrote: > > > BTW: I would like to find some official documentation > of OpenMP, but there seems none? > > > OpenMP (a multithreading specification) has "nothing" to do > with Open MPI (an implementation of MPI, a message-passing > specification). Assuming you meant OpenMP, try their > web site: http://openmp.org > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] speed up this problem by MPI
By serialization, I mean in the context of data storage and transmission. See http://en.wikipedia.org/wiki/Serialization e.g. in a structure or class, if there is a pointer pointing to some memory outside the structure or class, one has to send the content of the memory besides the structure or class, right? --- On Fri, 1/29/10, Eugene Loh wrote: > From: Eugene Loh > Subject: Re: [OMPI users] speed up this problem by MPI > To: "Open MPI Users" > Date: Friday, January 29, 2010, 11:06 AM > Tim wrote: > > > Sorry, my typo. I meant to say OpenMPI documentation. > > > Okay. "Open (space) MPI" is simply an implementation > of the MPI standard -- e.g., http://www.mpi-forum.org/docs/mpi21-report.pdf . > I imagine an on-line search will turn up a variety of > tutorials and explanations of that standard. But the > standard, itself, is somewhat readable. > > > How to send/recieve and broadcast objects of > self-defined class and of std::vector? If using > MPI_Type_struct, the setup becomes complicated if the class > has various types of data members, and a data member of > another class. > > > I don't really know any C++, but I guess you're looking at > it the right way. That is, use derived MPI data types > and "it's complicated". > > > How to deal with serialization problems? > > > Which serialization problems? You seem to have a > split/join problem. The master starts, at some point > there is parallel computation, then the masters does more > work at the end. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] Test OpenMPI on a cluster
Hi, I am learning MPI on a cluster. Here is one simple example. I expect the output would show response from different nodes, but they all respond from the same node node062. I just wonder why and how I can actually get report from different nodes to show MPI actually distributes processes to different nodes? Thanks and regards! ex1.c /* test of MPI */ #include "mpi.h" #include #include int main(int argc, char **argv) { char idstr[2232]; char buff[22128]; char processor_name[MPI_MAX_PROCESSOR_NAME]; int numprocs; int myid; int i; int namelen; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); if(myid == 0) { printf("WE have %d processors\n", numprocs); for(i=1;i
Re: [O-MPI users] Questions on status
Well said Jeff! I look forward to seeing Open MPI's code when it's released. Until then, I am happy to continue to use LAM/MPI for my clusters. I wish you had called it OpenMPI though... better for googling ;-) -- Tim Mattox - tmat...@gmail.com http://homepage.mac.com/tmattox/ I'm a bright... http://www.the-brights.net/
Re: [O-MPI users] Further thoughts
Hello, This has been an interesting discussion to follow. Here are my thoughts on the RPM packaging... On 6/16/05, Jeff Squyres wrote: [snip] > We've also got the "announce" mailing list -- a low volume list just > for announcing new releases (and *exciting* messages about products you > might be interested in... just kidding.). ;-) [snip] > We actually got a lot of help in this area from Greg Kurtzer from LBL > (of cAos/Warewulf/Centos fame). He helped us a bunch with our > [previously extremely lame] LAM/MPI .spec file, and then offered to > write one for Open MPI (which he did about a month or two ago). > > I have some random user questions about RPMs, though: > > 1. Would you prefer an all-in-one Open MPI RPM, or would you prefer > multiple RPMs (e.g., openmpi-doc, openmpi-devel, openmpi-runtime, > ...etc.)? I prefer split RPMs. The fingrained split you mention works well for thin/diskless-nodes, but a simple split of runtime vs everything-else would be "good enough". The primary problem with an all-in-one RPM would be the footprint of the non-MPI packages that satisfy MPI's dependence tree, especially the compilers. > 2. We're definitely going to provide an SRPM suitable for "rpmbuild > --rebuild". However, we're not 100% sure that it's worthwhile to > provide binary RPMs because everyone's cluster/development systems seem > to be "one off" from standard Linux distros. Do you want a binary > RPM(s)? If so, for which distros? (this is one area where vendors > tend to have dramatically different views than academics/researchers) If you supply fairly clean SRPMs, I think the distros themselves can do the binary RPM building themselves. At least that is easy enough for cAos to do. I guess the problem lies in the disparity in the distribution release cycle and Open MPI's expected release cycle. Certain RedHat distribution versions shipped with amazingly old versions of LAM/MPI, which I recall caused no end of trouble on the LAM/MPI mailing lists with questions from long-ago fixed bugs. How much is it worth to the Open MPI team to be able to answer those questions with: rpm -Uvh http://open-mpi.org//open-mpi-1.0-fixed.x86_64.rpm rather than having to explain how to do "rpmbuild --rebuild". I'll suggest that eventually you will want binary RPMs for SUSE 9.3 and CentOS 4 and/or Scientific Linux 4 in both i386 & x86_64 flavors. I'm sure you will get demand for a lot of Fedora Core flavors, but I think that road leads to madness... I think it might work out better to try and get Open MPI into Dag Wieers RPM/APT/YUM repositories... see: http://dag.wieers.com/home-made/apt/ or the still-under-construction RPMforge site: http://rpmforge.net/ That's more than my two cents... -- Tim Mattox - tmat...@gmail.com http://homepage.mac.com/tmattox/ I'm a bright... http://www.the-brights.net/
[O-MPI users] Fwd: Fwd: [Beowulf] MorphMPI based on fortran itf
Toon, > We are planning to develop a MorphMPI library. As explained a bit > higher > up in this thread, the MorphMPI library will be used while *compiling* > the app. The library that implements the MorphMPI calls will be linked > with dynamically. The MorphMPI on its turn links with some specific MPI > library. To take into account the (binary incompatible) difference in > the MPI libraries, the MorphMPI can be recompiled to be compatible with > any other MPI implementation (without having to recompile the app). I am in the process of developing MorphMPI and have designed my implementation a bit different than what you propose (my apologies if I misunderstood what you have said). I am creating one main library, which users will compile and run against, and which should not need to be recompiled. This library will then open a plugin depending on what MPI the user would like to use. Then, it will dynamically open the actual MPI implementation. In other words, to add support for another MPI one would just need to drop the appropriate plugin into the right directory. As far as overhead is concerned, it appears that it will be minimal. In the 32 bit world most conversions can be optimized away, and in the 64bit world it looks like only minimal conversions will need to be made. The main exception to this is defined structures (aka MPI_Status) and any cases where a user can pass an array to an MPI function. These will require a bit more work, but it still looks like the overhead will be small. Tim
Re: [O-MPI users] Fwd: Fwd: [Beowulf] MorphMPI based on fortran itf
Quoting Toon Knapen : > Tim Prins wrote: > > > I am in the process of developing MorphMPI and have designed my > > implementation a bit different than what you propose (my apologies > if I > > misunderstood what you have said). I am creating one main library, > which > > users will compile and run against, and which should not need to > be > > recompiled. This library will then open a plugin depending on what > MPI > > the user would like to use. Then, it will dynamically open the > actual > > MPI implementation. In other words, to add support for another MPI > one > > would just need to drop the appropriate plugin into the right > directory. > > > Thus IIUC, the app calls your lib and your lib on its turn calls a > plugin? Not quite. The plugin will merely consist of a data table, which will tell me all I need to know about the MPI and how to call its functions. Thus the app will call a function in MorphMPI which will in turn call a function in the actual MPI. > This involves two dereferences. My idea was to (be able to) > recompile the MorphMPI for each of the MPI lib's and plug this one > between the app and the MPI. AFACIT this approach has the same set > of > features but is more lightweight. However, if you have to recompile MorphMPI for each mpi, you loose a lot of the benefits of having an ABI, i.e. being able to easily run with multiple implementations without recompiling. In this project I am really going for easy extensibility and ease of use for the user. > > Is your project open-source? If so, can I check it out? It will be open-source, but right now this project is still in its early stages so there is nothing to release yet. Tim
Re: [OMPI users] busy waiting and oversubscriptions
On 3/26/2014 6:45 AM, Andreas Schäfer wrote: On 10:27 Wed 26 Mar , Jeff Squyres (jsquyres) wrote: Be aware of a few facts, though: 1. There is a fundamental difference between disabling hyperthreading in the BIOS at power-on time and simply running one MPI process per core. Disabling HT at power-on allocates more hardware resources to the remaining HT that is left is each core (e.g., deeper queues). Oh, I didn't know that. That's interesting! Do you have any links with in-depth info on that? On certain Intel CPUs, the full size instruction TLB was available to a process when HyperThreading was disabled on the BIOS setup menu, and that was the only way to make all the Write Combine buffers available to a single process. Those CPUs are no longer in widespread use. At one time, at Intel, we did a study to evaluate the net effect (on a later CPU where this did not recover ITLB size). The result was buried afterwards; possibly it didn't meet an unspecified marketing goal. Typical applications ran 1% faster with HyperThreading disabled by BIOS menu even with affinities carefully set to use just one process per core. Not all applications showed a loss on all data sets when leaving HT enabled. There are a few MPI applications with specialized threading which could gain 10% or more by use of HT. In my personal opinion, SMT becomes less interesting as the number of independent cores increases. Intel(r) Xeon Phi(tm) is an exception, as the vector processing unit issues instructions from a single thread only on alternate cycles. This capability is used more effectively by running OpenMP threads under MPI, e.g. 6 ranks per coprocessor of 30 threads each, spread across 10 cores per rank (exact optimum depending on the application; MKL libraries use all available hardware threads for sufficiently large data sets). -- Tim Prince
[OMPI users] OPENIB unknown transport errors
Hi All, We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters, and periodically our jobs abort at start-up with the following error: === Open MPI detected two different OpenFabrics transport types in the same Infiniband network. Such mixed network trasport configuration is not supported by Open MPI. Local host:w4 Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB Remote host: w34 Remote Adapter:(vendor 0x2c9, part ID 26428) Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN === I've done a bit of googling and not found very much. We do not see this issue when we run with MVAPICH2 on the same sets of nodes. Any advice or thoughts would be very welcome, as I am stumped by what causes this. The nodes are all running Scientific Linux 6 with Mellanox drivers installed via the SL-provided RPMs. Tim
Re: [OMPI users] OPENIB unknown transport errors
I've checked the links repeatedly with "ibstatus" and they look OK. Both nodes shoe a link layer of "InfiniBand". As I stated, everything works well with MVAPICH2, so I don't suspect a physical or link layer problem (but I could always be wrong on that). Tim On Fri, May 9, 2014 at 6:26 PM, Joshua Ladd wrote: > Hi, Tim > > Run "ibstat" on each host: > > 1. Make sure the adapters are alive and active. > > 2. Look at the Link Layer settings for host w34. Does it match host w4's? > > > Josh > > > On Fri, May 9, 2014 at 1:18 PM, Tim Miller wrote: > >> Hi All, >> >> We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters, and >> periodically our jobs abort at start-up with the following error: >> >> === >> Open MPI detected two different OpenFabrics transport types in the same >> Infiniband network. >> Such mixed network trasport configuration is not supported by Open MPI. >> >> Local host:w4 >> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >> >> Remote host: w34 >> Remote Adapter:(vendor 0x2c9, part ID 26428) >> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >> === >> >> I've done a bit of googling and not found very much. We do not see this >> issue when we run with MVAPICH2 on the same sets of nodes. >> >> Any advice or thoughts would be very welcome, as I am stumped by what >> causes this. The nodes are all running Scientific Linux 6 with Mellanox >> drivers installed via the SL-provided RPMs. >> >> Tim >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] openMPI in 64 bit
On 5/15/2014 3:13 PM, Ajay Nair wrote: I have been using openMPI for my application with intel visual fortran. The version that I am currently using is openMPI-1.6.2. It works fine iwth fortran code compiled in 32bit and run it with openMPI 32 bit files. However recently I moved to a 64 bit machine and even though I could compile the code successfully with intel fortran 64 bit and also pointing the openMPI to the corresponding 64 bit files, the exe would not start and threw the error: *the application was unable to start correctly (0x7b)* * * This is because the msvcr100d.dll file (this is required by openMPI even when I run in 32bit mode) is a 32 bit dll file and it probably requires 64 bit equivalent. I could not find any 64 bit equivalent for this dll. My question is why is openMPI looking for this dll file (even in case of 32bit compilation). Can i do away with this dependency or is there any way I can run it in 64 bit? 64-bit Windows of course includes full 32-bit support, so you might still run your 32-bit MPI application. You would need a full 64-bit build of the MPI libraries for compatibility with your 64-bit application. I haven't seen any indication that anyone is supporting openmpi for ifort Windows 64-bit. The closest openmpi thing seems to be the cygwin (gcc/gfortran) build. Windows seems to be too crowded for so many MPI versions to succeed. -- Tim Prince
Re: [OMPI users] intel compiler and openmpi 1.8.1
On 05/29/2014 07:11 AM, Lorenzo Donà wrote: I compiled openmpi 1.8.1 with intel compiler with this conf. ./configure FC=ifort CC=icc CXX=icpc --prefix=/Users/lorenzodona/Documents/openmpi-1.8.1/ but when i write mpif90 -v i found: Using built-in specs. COLLECT_GCC=/opt/local/bin/gfortran-mp-4.8 COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin13/4.8.2/lto-wrapper Target: x86_64-apple-darwin13 Configured with: /opt/local/var/macports/build/_opt_mports_dports_lang_gcc48/gcc48/work/gcc-4.8.2/configure --prefix=/opt/local --build=x86_64-apple-darwin13 --enable-languages=c,c++,objc,obj-c++,lto,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8.2_0' Thread model: posix gcc version 4.8.2 (MacPorts gcc48 4.8.2_0) and version i found: GNU Fortran (MacPorts gcc48 4.8.2_0) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. GNU Fortran comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of GNU Fortran under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING So I think that is not compiled with intel compiler please can you help me. thanks thanks a lot for your patience and to help me Perhaps you forgot to make the Intel compilers active in your configure session. Normally this would be done by command such as source /opt/intel/composer_xe_2013/bin/compilervars.sh intel64 In such a case, if you would examine the configure log, you would expect to see a failed attempt to reach ifort, falling back to your gfortran. On the C and C++ side, the MPI libraries should be compatible between gnu and Intel compilers, but the MPI Fortran library would not be compatible between gfortran and ifort.
Re: [OMPI users] OPENIB unknown transport errors
Hi, I'd like to revive this thread, since I am still periodically getting errors of this type. I have built 1.8.1 with --enable-debug and run with -mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide any additional information that I can find useful. I've gone ahead and attached a dump of the output under 1.8.1. The key lines are: -- Open MPI detected two different OpenFabrics transport types in the same Infiniband network. Such mixed network trasport configuration is not supported by Open MPI. Local host:w1 Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB Remote host: w16 Remote Adapter:(vendor 0x2c9, part ID 26428) Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN - Note that the vendor and part IDs are the same. If I immediately run on the same two nodes using MVAPICH2, everything is fine. I'm really very befuddled by this. OpenMPI sees that the two cards are the same and made by the same vendor, yet it thinks the transport types are different (and one is unknown). I'm hoping someone with some experience with how the OpenIB BTL works can shed some light on this problem... Tim On Fri, May 9, 2014 at 7:39 PM, Joshua Ladd wrote: > > Just wondering if you've tried with the latest stable OMPI, 1.8.1? I'm > wondering if this is an issue with the OOB. If you have a debug build, you > can run -mca btl_openib_verbose 10 > > Josh > > > On Fri, May 9, 2014 at 6:26 PM, Joshua Ladd wrote: > >> Hi, Tim >> >> Run "ibstat" on each host: >> >> 1. Make sure the adapters are alive and active. >> >> 2. Look at the Link Layer settings for host w34. Does it match host w4's? >> >> >> Josh >> >> >> On Fri, May 9, 2014 at 1:18 PM, Tim Miller wrote: >> >>> Hi All, >>> >>> We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters, >>> and periodically our jobs abort at start-up with the following error: >>> >>> === >>> Open MPI detected two different OpenFabrics transport types in the same >>> Infiniband network. >>> Such mixed network trasport configuration is not supported by Open MPI. >>> >>> Local host:w4 >>> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >>> >>> Remote host: w34 >>> Remote Adapter:(vendor 0x2c9, part ID 26428) >>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >>> === >>> >>> I've done a bit of googling and not found very much. We do not see this >>> issue when we run with MVAPICH2 on the same sets of nodes. >>> >>> Any advice or thoughts would be very welcome, as I am stumped by what >>> causes this. The nodes are all running Scientific Linux 6 with Mellanox >>> drivers installed via the SL-provided RPMs. >>> >>> Tim >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > [w1][[56731,1],1][btl_openib_ini.c:170:ompi_btl_openib_ini_query] Querying INI files for vendor 0x02c9, part ID 26428 [w1][[56731,1],1][btl_openib_ini.c:189:ompi_btl_openib_ini_query] Found corresponding INI values: Mellanox Hermon [w1][[56731,1],1][btl_openib_ini.c:170:ompi_btl_openib_ini_query] Querying INI files for vendor 0x, part ID 0 [w1][[56731,1],1][btl_openib_ini.c:189:ompi_btl_openib_ini_query] Found corresponding INI values: default [w1][[56731,1],0][btl_openib_ini.c:170:ompi_btl_openib_ini_query] Querying INI files for vendor 0x02c9, part ID 26428 [w1][[56731,1],0][btl_openib_ini.c:189:ompi_btl_openib_ini_query] Found corresponding INI values: Mellanox Hermon [w1][[56731,1],0][btl_openib_ini.c:170:ompi_btl_openib_ini_query] Querying INI files for vendor 0x, part ID 0 [w1][[56731,1],0][btl_openib_ini.c:189:ompi_btl_openib_ini_query] Found corresponding INI values: default [w1][[56731,1],2][btl_openib_ini.c:170:ompi_btl_openib_ini_query] Querying INI files for vendor 0x02c9, part ID 26428 [w1][[56731,1],2][btl_openib_ini.c:189:ompi_btl_openib_ini_query] Found corresponding INI values: Mellanox Hermon [w1][[56731,1],2][btl_openib_ini.c:170:omp
Re: [OMPI users] OPENIB unknown transport errors
Hi Josh, Thanks for attempting to sort this out. In answer to your questions: 1. Node allocation is done by TORQUE, however we don't use the TM API to launch jobs (long story). Instead, we just pass a hostfile to mpirun, and mpirun uses the ssh launcher to actually communicate and launch the processes on remote nodes. 2. We have only one port per HCA (the HCA silicon is integrated with the motherboard on most of our nodes, including all that have this issue). They are all configured to use InfiniBand (no IPoIB or other protocols). 3. No, we don't explicitly ask for a device port pair. We will try your suggestion and report back. Thanks again! Tim On Thu, Jun 5, 2014 at 2:22 PM, Joshua Ladd wrote: > Strange indeed. This info (remote adapter info) is passed around in the > modex and the struct is locally populated during add procs. > > 1. How do you launch jobs? Mpirun, srun, or something else? > 2. How many active ports do you have on each HCA? Are they all configured > to use IB? > 3. Do you explicitly ask for a device:port pair with the "if include" mca > param? If not, can you please add "-mca btl_openib_if_include mlx4_0:1" > (assuming you have a ConnectX-3 HCA and port 1 is configured to run over > IB.) > > Josh > > > On Wed, Jun 4, 2014 at 12:47 PM, Tim Miller wrote: > >> Hi, >> >> I'd like to revive this thread, since I am still periodically getting >> errors of this type. I have built 1.8.1 with --enable-debug and run with >> -mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide any >> additional information that I can find useful. I've gone ahead and attached >> a dump of the output under 1.8.1. The key lines are: >> >> -- >> Open MPI detected two different OpenFabrics transport types in the same >> Infiniband network. >> Such mixed network trasport configuration is not supported by Open MPI. >> >> Local host:w1 >> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >> >> Remote host: w16 >> Remote Adapter:(vendor 0x2c9, part ID 26428) >> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >> - >> >> Note that the vendor and part IDs are the same. If I immediately run on >> the same two nodes using MVAPICH2, everything is fine. >> >> I'm really very befuddled by this. OpenMPI sees that the two cards are >> the same and made by the same vendor, yet it thinks the transport types are >> different (and one is unknown). I'm hoping someone with some experience >> with how the OpenIB BTL works can shed some light on this problem... >> >> Tim >> >> >> On Fri, May 9, 2014 at 7:39 PM, Joshua Ladd wrote: >> >>> >>> Just wondering if you've tried with the latest stable OMPI, 1.8.1? I'm >>> wondering if this is an issue with the OOB. If you have a debug build, you >>> can run -mca btl_openib_verbose 10 >>> >>> Josh >>> >>> >>> On Fri, May 9, 2014 at 6:26 PM, Joshua Ladd >>> wrote: >>> >>>> Hi, Tim >>>> >>>> Run "ibstat" on each host: >>>> >>>> 1. Make sure the adapters are alive and active. >>>> >>>> 2. Look at the Link Layer settings for host w34. Does it match host >>>> w4's? >>>> >>>> >>>> Josh >>>> >>>> >>>> On Fri, May 9, 2014 at 1:18 PM, Tim Miller wrote: >>>> >>>>> Hi All, >>>>> >>>>> We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters, >>>>> and periodically our jobs abort at start-up with the following error: >>>>> >>>>> === >>>>> Open MPI detected two different OpenFabrics transport types in the >>>>> same Infiniband network. >>>>> Such mixed network trasport configuration is not supported by Open MPI. >>>>> >>>>> Local host:w4 >>>>> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >>>>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >>>>> >>>>> Remote host: w34 >>>>> Remote Adapter:(vendor 0x2c9, part ID 26428) >>>>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >>>
Re: [OMPI users] OPENIB unknown transport errors
Hi Josh, I asked one of our more advanced users to add the "-mca btl_openib_if_include mlx4_0:1" argument to his job script. Unfortunately, the same error occurred as before. We'll keep digging on our end; if you have any other suggestions, please let us know. Tim On Thu, Jun 5, 2014 at 7:32 PM, Tim Miller wrote: > Hi Josh, > > Thanks for attempting to sort this out. In answer to your questions: > > 1. Node allocation is done by TORQUE, however we don't use the TM API to > launch jobs (long story). Instead, we just pass a hostfile to mpirun, and > mpirun uses the ssh launcher to actually communicate and launch the > processes on remote nodes. > 2. We have only one port per HCA (the HCA silicon is integrated with the > motherboard on most of our nodes, including all that have this issue). They > are all configured to use InfiniBand (no IPoIB or other protocols). > 3. No, we don't explicitly ask for a device port pair. We will try your > suggestion and report back. > > Thanks again! > > Tim > > > On Thu, Jun 5, 2014 at 2:22 PM, Joshua Ladd wrote: > >> Strange indeed. This info (remote adapter info) is passed around in the >> modex and the struct is locally populated during add procs. >> >> 1. How do you launch jobs? Mpirun, srun, or something else? >> 2. How many active ports do you have on each HCA? Are they all configured >> to use IB? >> 3. Do you explicitly ask for a device:port pair with the "if include" mca >> param? If not, can you please add "-mca btl_openib_if_include mlx4_0:1" >> (assuming you have a ConnectX-3 HCA and port 1 is configured to run over >> IB.) >> >> Josh >> >> >> On Wed, Jun 4, 2014 at 12:47 PM, Tim Miller wrote: >> >>> Hi, >>> >>> I'd like to revive this thread, since I am still periodically getting >>> errors of this type. I have built 1.8.1 with --enable-debug and run with >>> -mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide any >>> additional information that I can find useful. I've gone ahead and attached >>> a dump of the output under 1.8.1. The key lines are: >>> >>> >>> -- >>> Open MPI detected two different OpenFabrics transport types in the same >>> Infiniband network. >>> Such mixed network trasport configuration is not supported by Open MPI. >>> >>> Local host:w1 >>> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >>> >>> Remote host: w16 >>> Remote Adapter:(vendor 0x2c9, part ID 26428) >>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >>> - >>> >>> Note that the vendor and part IDs are the same. If I immediately run on >>> the same two nodes using MVAPICH2, everything is fine. >>> >>> I'm really very befuddled by this. OpenMPI sees that the two cards are >>> the same and made by the same vendor, yet it thinks the transport types are >>> different (and one is unknown). I'm hoping someone with some experience >>> with how the OpenIB BTL works can shed some light on this problem... >>> >>> Tim >>> >>> >>> On Fri, May 9, 2014 at 7:39 PM, Joshua Ladd >>> wrote: >>> >>>> >>>> Just wondering if you've tried with the latest stable OMPI, 1.8.1? I'm >>>> wondering if this is an issue with the OOB. If you have a debug build, you >>>> can run -mca btl_openib_verbose 10 >>>> >>>> Josh >>>> >>>> >>>> On Fri, May 9, 2014 at 6:26 PM, Joshua Ladd >>>> wrote: >>>> >>>>> Hi, Tim >>>>> >>>>> Run "ibstat" on each host: >>>>> >>>>> 1. Make sure the adapters are alive and active. >>>>> >>>>> 2. Look at the Link Layer settings for host w34. Does it match host >>>>> w4's? >>>>> >>>>> >>>>> Josh >>>>> >>>>> >>>>> On Fri, May 9, 2014 at 1:18 PM, Tim Miller >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> We're using OpenMPI 1.7.3 with Mellanox ConnectX InfiniBand adapters, >
Re: [OMPI users] openmpi linking problem
On 6/9/2014 1:14 PM, Sergii Veremieiev wrote: Dear Sir/Madam, I'm trying to link a C/FORTRAN code on Cygwin with Open MPI 1.7.5 and GCC 4.8.2: mpicxx ./lib/Multigrid.o ./lib/GridFE.o ./lib/Data.o ./lib/GridFD.o ./lib/Parameters.o ./lib/MtInt.o ./lib/MtPol.o ./lib/MtDob.o -o Test_cygwin_openmpi_gcc -L./external/MUMPS/lib -ldmumps_cygwin_openmpi_gcc -lmumps_common_cygwin_openmpi_gcc -lpord_cygwin_openmpi_gcc -L./external/ParMETIS -lparmetis_cygwin_openmpi_gcc -lmetis_cygwin_openmpi_gcc -L./external/SCALAPACK -lscalapack_cygwin_openmpi_gcc -L./external/BLACS/LIB -lblacs-0_cygwin_openmpi_gcc -lblacsF77init-0_cygwin_openmpi_gcc -lblacsCinit-0_cygwin_openmpi_gcc -lblacs-0_cygwin_openmpi_gcc -L./external/BLAS -lblas_cygwin_openmpi_gcc -lmpi -lgfortran The following error messages are returned: ./external/MUMPS/lib/libdmumps_cygwin_openmpi_gcc.a(dmumps_part3.o): In function `dmumps_127_': /cygdrive/d/Sergey/Research/Codes/Thinfilmsolver/external/MUMPS/src/dmumps_part3.F:6068: undefined reference to `mpi_send_' You appear to need the MPI Fortran libraries (built with your version of gfortran) corresponding to mpif.h or use mpi... If you can use mpifort to link, you would use -lstdc++ in place of -lmpi -lgfortran . -- Tim Prince
Re: [OMPI users] OPENIB unknown transport errors
Aha ... looking at "ibv_devinfo -v" got me my first concrete hint of what's going on. On a node that's working fine (w2), under port 1 there is a line: LinkLayer: InfiniBand On a node that is having trouble (w3), that line is not present. The question is why this inconsistency occurs. I don't seem to have ofed_info installed on my system -- not sure what magical package Red Hat decided to hide that in. The InfiniBand stack I am running is stock with our version of Scientific Linux (6.2). I am beginning to wonder if this isn't some bug with the Red Hat/SL-provided InfiniBand stack. I'll do some more poking, but at least now I've got something semi-solid to poke at. Thanks for all of your help; I've attached the results of "ibv_devinfo -v" for both systems, so if you see anything else that jumps at you, please let me know. Tim On Sat, Jun 7, 2014 at 2:21 AM, Mike Dubman wrote: > could you please attach output of "ibv_devinfo -v" and "ofed_info -s" > Thx > > > On Sat, Jun 7, 2014 at 12:53 AM, Tim Miller wrote: > >> Hi Josh, >> >> I asked one of our more advanced users to add the "-mca btl_openib_if_include >> mlx4_0:1" argument to his job script. Unfortunately, the same error >> occurred as before. >> >> We'll keep digging on our end; if you have any other suggestions, please >> let us know. >> >> Tim >> >> >> On Thu, Jun 5, 2014 at 7:32 PM, Tim Miller wrote: >> >>> Hi Josh, >>> >>> Thanks for attempting to sort this out. In answer to your questions: >>> >>> 1. Node allocation is done by TORQUE, however we don't use the TM API to >>> launch jobs (long story). Instead, we just pass a hostfile to mpirun, and >>> mpirun uses the ssh launcher to actually communicate and launch the >>> processes on remote nodes. >>> 2. We have only one port per HCA (the HCA silicon is integrated with the >>> motherboard on most of our nodes, including all that have this issue). They >>> are all configured to use InfiniBand (no IPoIB or other protocols). >>> 3. No, we don't explicitly ask for a device port pair. We will try your >>> suggestion and report back. >>> >>> Thanks again! >>> >>> Tim >>> >>> >>> On Thu, Jun 5, 2014 at 2:22 PM, Joshua Ladd >>> wrote: >>> >>>> Strange indeed. This info (remote adapter info) is passed around in the >>>> modex and the struct is locally populated during add procs. >>>> >>>> 1. How do you launch jobs? Mpirun, srun, or something else? >>>> 2. How many active ports do you have on each HCA? Are they all >>>> configured to use IB? >>>> 3. Do you explicitly ask for a device:port pair with the "if include" >>>> mca param? If not, can you please add "-mca btl_openib_if_include mlx4_0:1" >>>> (assuming you have a ConnectX-3 HCA and port 1 is configured to run over >>>> IB.) >>>> >>>> Josh >>>> >>>> >>>> On Wed, Jun 4, 2014 at 12:47 PM, Tim Miller >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'd like to revive this thread, since I am still periodically getting >>>>> errors of this type. I have built 1.8.1 with --enable-debug and run with >>>>> -mca btl_openib_verbose 10. Unfortunately, this doesn't seem to provide >>>>> any >>>>> additional information that I can find useful. I've gone ahead and >>>>> attached >>>>> a dump of the output under 1.8.1. The key lines are: >>>>> >>>>> >>>>> -- >>>>> Open MPI detected two different OpenFabrics transport types in the >>>>> same Infiniband network. >>>>> Such mixed network trasport configuration is not supported by Open MPI. >>>>> >>>>> Local host:w1 >>>>> Local adapter: mlx4_0 (vendor 0x2c9, part ID 26428) >>>>> Local transport type: MCA_BTL_OPENIB_TRANSPORT_IB >>>>> >>>>> Remote host: w16 >>>>> Remote Adapter:(vendor 0x2c9, part ID 26428) >>>>> Remote transport type: MCA_BTL_OPENIB_TRANSPORT_UNKNOWN >>>>> >>>>> - >>>>> >>>>> Note tha
Re: [OMPI users] openMP and mpi problem
On 7/4/2014 11:22 AM, Timur Ismagilov wrote: 1. Intell mpi is located here: /opt/intel/impi/4.1.0/intel64/lib. I have added OMPI path at the start and got the same output. If you can't read your own thread due to your scrambling order of posts, I'll simply reiterate what was mentioned before: ifort has its own mpiexec in the compiler install path to support co-array (not true MPI), so your MPI path entries must precede the ifort ones. Thus, it remains important to try checks such as 'which mpiexec' and assure that you are running the intended components. ifort co-arrays will not cooperate with presence of OpenMPI. -- Tim Prince
Re: [OMPI users] Multiple threads for an mpi process
On 9/12/2014 6:14 AM, JR Cary wrote: This must be a very old topic. I would like to run mpi with one process per node, e.g., using -cpus-per-rank=1. Then I want to use openmp inside of that. But other times I will run with a rank on each physical core. Inside my code I would like to detect which situation I am in. Is there an openmpi api call to determine that? omp_get_num_threads() should work. Unless you want to choose a different non-parallel algorithm for this case, a single thread omp parallel region works fine. You should soon encounter cases where you want intermediate choices, such as 1 rank per CPU package and 1 thread per core, even if you stay away from platforms with more than 12 cores per CPU.
Re: [OMPI users] Multiple threads for an mpi process
On 9/12/2014 9:22 AM, JR Cary wrote: On 9/12/14, 7:27 AM, Tim Prince wrote: On 9/12/2014 6:14 AM, JR Cary wrote: This must be a very old topic. I would like to run mpi with one process per node, e.g., using -cpus-per-rank=1. Then I want to use openmp inside of that. But other times I will run with a rank on each physical core. Inside my code I would like to detect which situation I am in. Is there an openmpi api call to determine that? omp_get_num_threads() should work. Unless you want to choose a different non-parallel algorithm for this case, a single thread omp parallel region works fine. You should soon encounter cases where you want intermediate choices, such as 1 rank per CPU package and 1 thread per core, even if you stay away from platforms with more than 12 cores per CPU. I may not understand, so I will try to ask in more detail. Suppose I am running on a four-core processor (and my code likes one thread per core). In case 1 I do mpiexec -np 2 myexec and I want to know that each mpi process should use 2 threads. If instead I did mpiexec -np 4 myexec I want to know that each mpi process should use one thread. Will omp_get_num_threads() should return a different value for those two cases? Perhaps I am not invoking mpiexec correctly. I use MPI_Init_thread(&argc, &argv, MPI_THREAD_FUNNELED, &threadSupport), and regardless of what how I invoke mpiexec (-n 1, -n 2, -n 4), I see 2 openmp processes and 1 openmp threads (have not called omp_set_num_threads). When I run serial, I see 8 openmp processes and 1 openmp threads. So I must be missing an arg to mpiexec? This is a 4-core haswell with hyperthreading to get 8. Sorry, I assumed you were setting OMP_NUM_THREADS for your runs. If you don't do that, each instance of OpenMP will try to run 8 threads, where you probably want just 1 thread per core. I turn off hyperthreading in BIOS on my machines, as I never run anything which would benefit from it.
Re: [OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently
Check by ldd in case you didn't update .so path Sent via the ASUS PadFone X mini, an AT&T 4G LTE smartphone Original Message From:John Bray Sent:Mon, 17 Nov 2014 11:41:32 -0500 To:us...@open-mpi.org Subject:[OMPI users] Fortran and OpenMPI 1.8.3 compiled with Intel-15 does nothing silently >___ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25823.php
[OMPI users] OpenMPI installation issue or mpi4py compatibility problem
Hello, I have been having some issues with trying to get OpenMPI working with mpi4py. I've tried to break down my troubleshooting into a few chunks below, and I believe that there are a few, distinct issues that need solving. Following some troubleshooting in the following link: https://bitbucket.org/mpi4py/mpi4py/issues/69/building-mpi4py-with-openmpi-gives-error -the mpi4py folks have suggested it an issue that might be better answered here. In summary, I have attempted to install OpenMPI on Ubuntu 16.04 to the following prefix: /opt/openmpi-openmpi-2.1.0. I have also manually added the following to my .bashrc: export PATH="/opt/openmpi/openmpi-2.1.0/bin:$PATH" MPI_DIR=/opt/openmpi/openmpi-2.1.0 export LD_LIBRARY_PATH=$MPI_DIR/lib:$LD_LIBRARY_PATH I later became aware that Ubuntu may handle the LD_LIBRARY_PATH differently and instead added a new file containing the library path /opt/openmpi/openmpi-2.1.0/lib to /etc/ld.so.conf.d/openmpi-2-1-0.conf, in the style of everything else in that directory. I tried to run "mpicc helloworld.c -o hello.bin" as a test on a demo file (as instructed in the link) to check the installation but I had permission issues, since it was installed win opt. However, when I attempted to run the previous with sudo, or sudo -E, in both cases, mpicc could not be found. (Perhaps this is a separate issue with my sudo env) To check that mpicc actually works, I have copied helloworld.c to a directory where I could execute mpicc without sudo. On running the above command, I receive the following error: mpicc: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20) /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference to `clGetPlatformInfo@OPENCL_1.0' /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference to `clGetPlatformIDs@OPENCL_1.0' /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference to `clGetDeviceInfo@OPENCL_1.0' /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference to `clGetDeviceIDs@OPENCL_1.0' collect2: error: ld returned 1 exit status I am unsure if I have an installation or permission issues, and I'd be grateful if anyone can shed some light based on the trials I've done so far. (I should add I also have a CUDA installation, which I'd like to leverage too, if possible). I'm still fairly new to the ins and outs of this, so I may have missed something obvious. Please let me know if any other info is required. Many thanks and kind regards, Tim -- *Timothy Jim**PhD Researcher in Aerospace* Creative Flow Research Division, Institute of Fluid Science, Tohoku University www.linkedin.com/in/timjim/ ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem
Hello, Thanks for your message. I'm trying to get this to work on a single machine. How might you suggest getting OpenMPIworking without python and CUDA? I don't recall setting anything for either, as the only command I had run was "./configure --prefix=/opt/openmpi/openmpi-2.1.0" - did it possibly pick up the paths by accident? Regarding the lib directory, I checked that the path physically exists. Regarding the final part of the email, is it a problem that 'undefined reference' is appearing? Thanks and regards, Tim On 22 May 2017 at 06:54, Reuti wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi, > > Am 18.05.2017 um 07:44 schrieb Tim Jim: > > > Hello, > > > > I have been having some issues with trying to get OpenMPI working with > mpi4py. I've tried to break down my troubleshooting into a few chunks > below, and I believe that there are a few, distinct issues that need > solving. > > Are you speaking here of a single machine or a cluster? > > > > Following some troubleshooting in the following link: > > https://bitbucket.org/mpi4py/mpi4py/issues/69/building- > mpi4py-with-openmpi-gives-error > > -the mpi4py folks have suggested it an issue that might be better > answered here. > > First approach would be to get Open MPI working, without CUDA and Python > being involved. > > > > In summary, I have attempted to install OpenMPI on Ubuntu 16.04 to the > following prefix: /opt/openmpi-openmpi-2.1.0. I have also manually added > the following to my .bashrc: > > export PATH="/opt/openmpi/openmpi-2.1.0/bin:$PATH" > > MPI_DIR=/opt/openmpi/openmpi-2.1.0 > > export LD_LIBRARY_PATH=$MPI_DIR/lib:$LD_LIBRARY_PATH > > This looks fine, although I don't recall setting MPI_DIR for Open MPI > itself. It might be a necessity for mpi4py though. > > One pitfall might be that "lib" is sometimes being created as "lib64" by > `libtool`. I forgot the details when this is happening, but it depends on > the version of `libtool` being used. > > > > I later became aware that Ubuntu may handle the LD_LIBRARY_PATH > differently > > I don't think that Ubuntu will do anything different than any other Linux. > > Did you compile Open MPI on your own, or did you install any repository? > > Are the CUDA application written by yourself or any freely available > applications? > > - -- Reuti > > > > and instead added a new file containing the library path > /opt/openmpi/openmpi-2.1.0/lib to /etc/ld.so.conf.d/openmpi-2-1-0.conf, > in the style of everything else in that directory. > > > > I tried to run "mpicc helloworld.c -o hello.bin" as a test on a demo > file (as instructed in the link) to check the installation but I had > permission issues, since it was installed win opt. However, when I > attempted to run the previous with sudo, or sudo -E, in both cases, mpicc > could not be found. (Perhaps this is a separate issue with my sudo env) > > > > To check that mpicc actually works, I have copied helloworld.c to a > directory where I could execute mpicc without sudo. On running the above > command, I receive the following error: > > > > mpicc: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no > version information available (required by /opt/openmpi/openmpi-2.1.0/ > lib/libopen-pal.so.20) > > /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference > to `clGetPlatformInfo@OPENCL_1.0' > > /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference > to `clGetPlatformIDs@OPENCL_1.0' > > /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference > to `clGetDeviceInfo@OPENCL_1.0' > > /opt/openmpi/openmpi-2.1.0/lib/libopen-pal.so.20: undefined reference > to `clGetDeviceIDs@OPENCL_1.0' > > collect2: error: ld returned 1 exit status > > > > I am unsure if I have an installation or permission issues, and I'd be > grateful if anyone can shed some light based on the trials I've done so > far. (I should add I also have a CUDA installation, which I'd like to > leverage too, if possible). I'm still fairly new to the ins and outs of > this, so I may have missed something obvious. Please let me know if any > other info is required. > > > > Many thanks and kind regards, > > Tim > > > > -- > > > > Timothy Jim > > PhD Researcher in Aerospace > > Creative Flow Research Division, > > Institute of Fluid Science, Tohoku University > > www.linkedin.com/in/timjim/ > > ___ > > users mailing list > > users@lists.open-m
Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem
Dear Reuti, Thanks for the reply. What options do I have to test whether it has successfully built? Thanks and kind regards. Tim On 22 May 2017 at 19:39, Reuti wrote: > Hi, > > > Am 22.05.2017 um 07:22 schrieb Tim Jim : > > > > Hello, > > > > Thanks for your message. I'm trying to get this to work on a single > > machine. > > Ok. > > > > How might you suggest getting OpenMPIworking without python and > > CUDA? > > It looks like it's detected automatically. It should be possible to > disable it with the command line option: > > $ ./configure --without-cuda … > > At the end of the configure step out should liste some lines like: > > Miscellaneous > --- > CUDA support: no > > The mpi4py seems unrelated to the compilation of Open MPI itself AFAICS. > > > > I don't recall setting anything for either, as the only command I had > > run was "./configure --prefix=/opt/openmpi/openmpi-2.1.0" - did it > possibly > > pick up the paths by accident? > > > > Regarding the lib directory, I checked that the path physically exists. > > Regarding the final part of the email, is it a problem that 'undefined > > reference' is appearing? > > Yes, it tries to resolve missing symbols and didn't succeed. > > -- Reuti > > > > > > Thanks and regards, > > Tim > > > > On 22 May 2017 at 06:54, Reuti wrote: > > > >> -BEGIN PGP SIGNED MESSAGE- > >> Hash: SHA1 > >> > >> Hi, > >> > >> Am 18.05.2017 um 07:44 schrieb Tim Jim: > >> > >>> Hello, > >>> > >>> I have been having some issues with trying to get OpenMPI working with > >> mpi4py. I've tried to break down my troubleshooting into a few chunks > >> below, and I believe that there are a few, distinct issues that need > >> solving. > >> > >> Are you speaking here of a single machine or a cluster? > >> > >> > >>> Following some troubleshooting in the following link: > >>> https://bitbucket.org/mpi4py/mpi4py/issues/69/building- > >> mpi4py-with-openmpi-gives-error > >>> -the mpi4py folks have suggested it an issue that might be better > >> answered here. > >> > >> First approach would be to get Open MPI working, without CUDA and Python > >> being involved. > >> > >> > >>> In summary, I have attempted to install OpenMPI on Ubuntu 16.04 to the > >> following prefix: /opt/openmpi-openmpi-2.1.0. I have also manually added > >> the following to my .bashrc: > >>> export PATH="/opt/openmpi/openmpi-2.1.0/bin:$PATH" > >>> MPI_DIR=/opt/openmpi/openmpi-2.1.0 > >>> export LD_LIBRARY_PATH=$MPI_DIR/lib:$LD_LIBRARY_PATH > >> > >> This looks fine, although I don't recall setting MPI_DIR for Open MPI > >> itself. It might be a necessity for mpi4py though. > >> > >> One pitfall might be that "lib" is sometimes being created as "lib64" by > >> `libtool`. I forgot the details when this is happening, but it depends > on > >> the version of `libtool` being used. > >> > >> > >>> I later became aware that Ubuntu may handle the LD_LIBRARY_PATH > >> differently > >> > >> I don't think that Ubuntu will do anything different than any other > Linux. > >> > >> Did you compile Open MPI on your own, or did you install any repository? > >> > >> Are the CUDA application written by yourself or any freely available > >> applications? > >> > >> - -- Reuti > >> > >> > >>> and instead added a new file containing the library path > >> /opt/openmpi/openmpi-2.1.0/lib to /etc/ld.so.conf.d/openmpi-2-1-0.conf, > >> in the style of everything else in that directory. > >>> > >>> I tried to run "mpicc helloworld.c -o hello.bin" as a test on a demo > >> file (as instructed in the link) to check the installation but I had > >> permission issues, since it was installed win opt. However, when I > >> attempted to run the previous with sudo, or sudo -E, in both cases, > mpicc > >> could not be found. (Perhaps this is a separate issue with my sudo env) > >>> > >>> To check that mpicc actually works, I have copied helloworld.c to a > >> directory where I could execute mpicc without sudo. On running the above > >> command, I receive the following error:
Re: [OMPI users] OpenMPI installation issue or mpi4py compatibility problem
Thanks for the thoughts, I'll give it a go. For reference, I have installed it in the opt directory, as that is where I have kept my installs currently. Will this be a problem when calling mpi from other packages? Thanks, Tim On 24 May 2017 06:30, "Reuti" wrote: > Hi, > > Am 23.05.2017 um 05:03 schrieb Tim Jim: > > > Dear Reuti, > > > > Thanks for the reply. What options do I have to test whether it has > successfully built? > > LIke before: can you compile and run mpihello.c this time – all as > ordinary user in case you installed the Open MPI into something like > $HOME/local/openmpi-2.1.1 and set paths accordingly. There is no need to be > root to install a personal Open MPI version in your home directory. > > -- Reuti > > > > > > Thanks and kind regards. > > Tim > > > > On 22 May 2017 at 19:39, Reuti wrote: > > Hi, > > > > > Am 22.05.2017 um 07:22 schrieb Tim Jim : > > > > > > Hello, > > > > > > Thanks for your message. I'm trying to get this to work on a single > > > machine. > > > > Ok. > > > > > > > How might you suggest getting OpenMPIworking without python and > > > CUDA? > > > > It looks like it's detected automatically. It should be possible to > disable it with the command line option: > > > > $ ./configure --without-cuda … > > > > At the end of the configure step out should liste some lines like: > > > > Miscellaneous > > --- > > CUDA support: no > > > > The mpi4py seems unrelated to the compilation of Open MPI itself AFAICS. > > > > > > > I don't recall setting anything for either, as the only command I had > > > run was "./configure --prefix=/opt/openmpi/openmpi-2.1.0" - did it > possibly > > > pick up the paths by accident? > > > > > > Regarding the lib directory, I checked that the path physically exists. > > > Regarding the final part of the email, is it a problem that 'undefined > > > reference' is appearing? > > > > Yes, it tries to resolve missing symbols and didn't succeed. > > > > -- Reuti > > > > > > > > > > Thanks and regards, > > > Tim > > > > > > On 22 May 2017 at 06:54, Reuti wrote: > > > > > >> -BEGIN PGP SIGNED MESSAGE- > > >> Hash: SHA1 > > >> > > >> Hi, > > >> > > >> Am 18.05.2017 um 07:44 schrieb Tim Jim: > > >> > > >>> Hello, > > >>> > > >>> I have been having some issues with trying to get OpenMPI working > with > > >> mpi4py. I've tried to break down my troubleshooting into a few chunks > > >> below, and I believe that there are a few, distinct issues that need > > >> solving. > > >> > > >> Are you speaking here of a single machine or a cluster? > > >> > > >> > > >>> Following some troubleshooting in the following link: > > >>> https://bitbucket.org/mpi4py/mpi4py/issues/69/building- > > >> mpi4py-with-openmpi-gives-error > > >>> -the mpi4py folks have suggested it an issue that might be better > > >> answered here. > > >> > > >> First approach would be to get Open MPI working, without CUDA and > Python > > >> being involved. > > >> > > >> > > >>> In summary, I have attempted to install OpenMPI on Ubuntu 16.04 to > the > > >> following prefix: /opt/openmpi-openmpi-2.1.0. I have also manually > added > > >> the following to my .bashrc: > > >>> export PATH="/opt/openmpi/openmpi-2.1.0/bin:$PATH" > > >>> MPI_DIR=/opt/openmpi/openmpi-2.1.0 > > >>> export LD_LIBRARY_PATH=$MPI_DIR/lib:$LD_LIBRARY_PATH > > >> > > >> This looks fine, although I don't recall setting MPI_DIR for Open MPI > > >> itself. It might be a necessity for mpi4py though. > > >> > > >> One pitfall might be that "lib" is sometimes being created as "lib64" > by > > >> `libtool`. I forgot the details when this is happening, but it > depends on > > >> the version of `libtool` being used. > > >> > > >> > > >>> I later became aware that Ubuntu may handle the LD_LIBRARY_PATH > > >> differently > > >> > > >> I don't think that Ubuntu will do anything differ
[OMPI users] Node failure handling
Hi! So I know from searching the archive that this is a repeated topic of discussion here, and apologies for that, but since it's been a year or so I thought I'd double-check whether anything has changed before really starting to tear my hair out too much. Is there a combination of MCA parameters or similar that will prevent ORTE from aborting a job when it detects a node failure? This is using the tcp btl, under slurm. The application, not written by us and too complicated to re-engineer at short notice, has a strictly master-slave communication pattern. The master never blocks on communication from individual slaves, and apparently can itself detect slaves that have silently disappeared and reissue the work to those remaining. So from an application standpoint I believe we should be able to handle this. However, in all my testing so far the job is aborted as soon as the runtime system figures out what is going on. If not, do any users know of another MPI implementation that might work for this use case? As far as I can tell, FT-MPI has been pretty quiet the last couple of years? Thanks in advance, Tim ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Node failure handling
Hi Ralph, George, Thanks very much for getting back to me. Alas, neither of these options seem to accomplish the goal. Both in OpenMPI v2.1.1 and on a recent master (7002535), with slurm's "--no-kill" and openmpi's "--enable-recovery", once the node reboots one gets the following error: ``` -- ORTE has lost communication with a remote daemon. HNP daemon : [[58323,0],0] on node pnod0330 Remote daemon: [[58323,0],1] on node pnod0331 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job. -- [pnod0330:110442] [[58323,0],0] orted_cmd: received halt_vm cmd [pnod0332:56161] [[58323,0],2] orted_cmd: received halt_vm cmd ``` I haven't yet tried the hard reboot case with ULFM (these nodes take forever to come back up), but earlier experiments SIGKILLing the orted on a compute node led to a very similar message as above, so at this point I'm not optimistic... I think my next step is to try with several separate mpiruns and use mpi_comm_{connect,accept} to plumb everything together before the application starts. I notice this is the subject of some recent work on ompi master. Even though the mpiruns will all be associated to the same ompi-server, do you think this could be sufficient to isolate the failures? Cheers, Tim On 10 June 2017 at 00:56, r...@open-mpi.org wrote: > It has been awhile since I tested it, but I believe the --enable-recovery > option might do what you want. > >> On Jun 8, 2017, at 6:17 AM, Tim Burgess wrote: >> >> Hi! >> >> So I know from searching the archive that this is a repeated topic of >> discussion here, and apologies for that, but since it's been a year or >> so I thought I'd double-check whether anything has changed before >> really starting to tear my hair out too much. >> >> Is there a combination of MCA parameters or similar that will prevent >> ORTE from aborting a job when it detects a node failure? This is >> using the tcp btl, under slurm. >> >> The application, not written by us and too complicated to re-engineer >> at short notice, has a strictly master-slave communication pattern. >> The master never blocks on communication from individual slaves, and >> apparently can itself detect slaves that have silently disappeared and >> reissue the work to those remaining. So from an application >> standpoint I believe we should be able to handle this. However, in >> all my testing so far the job is aborted as soon as the runtime system >> figures out what is going on. >> >> If not, do any users know of another MPI implementation that might >> work for this use case? As far as I can tell, FT-MPI has been pretty >> quiet the last couple of years? >> >> Thanks in advance, >> >> Tim >> ___ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > ___ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users
Re: [OMPI users] Node failure handling
Hi Ralph, Thanks for the quick response. Just tried again not under slurm, but the same result... (though I just did kill -9 orted on the remote node this time) Any ideas? Do you think my multiple-mpirun idea is worth trying? Cheers, Tim ``` [user@bud96 mpi_resilience]$ /d/home/user/2017/openmpi-master-20170608/bin/mpirun --mca plm rsh --host bud96,pnod0331 -np 2 --npernode 1 --enable-recovery --debug-daemons $(pwd)/test ( some output from job here ) ( I then do kill -9 `pgrep orted` on pnod0331 ) bash: line 1: 161312 Killed /d/home/user/2017/openmpi-master-20170608/bin/orted -mca orte_debug_daemons "1" -mca ess "env" -mca ess_base_jobid "581828608" -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca orte_node_regex "bud[2:96],pnod[4:331]@0(2)" -mca orte_hnp_uri "581828608.0;tcp://172.16.251.96,172.31.1.254:58250" -mca plm "rsh" -mca rmaps_ppr_n_pernode "1" -mca orte_enable_recovery "1" -- ORTE has lost communication with a remote daemon. HNP daemon : [[8878,0],0] on node bud96 Remote daemon: [[8878,0],1] on node pnod0331 This is usually due to either a failure of the TCP network connection to the node, or possibly an internal failure of the daemon itself. We cannot recover from this failure, and therefore will terminate the job. -- [bud96:20652] [[8878,0],0] orted_cmd: received halt_vm cmd [bud96:20652] [[8878,0],0] orted_cmd: all routes and children gone - exiting ``` On 27 June 2017 at 12:19, r...@open-mpi.org wrote: > Ah - you should have told us you are running under slurm. That does indeed > make a difference. When we launch the daemons, we do so with "srun > --kill-on-bad-exit” - this means that slurm automatically kills the job if > any daemon terminates. We take that measure to avoid leaving zombies behind > in the event of a failure. > > Try adding “-mca plm rsh” to your mpirun cmd line. This will use the rsh > launcher instead of the slurm one, which gives you more control. > >> On Jun 26, 2017, at 6:59 PM, Tim Burgess wrote: >> >> Hi Ralph, George, >> >> Thanks very much for getting back to me. Alas, neither of these >> options seem to accomplish the goal. Both in OpenMPI v2.1.1 and on a >> recent master (7002535), with slurm's "--no-kill" and openmpi's >> "--enable-recovery", once the node reboots one gets the following >> error: >> >> ``` >> -- >> ORTE has lost communication with a remote daemon. >> >> HNP daemon : [[58323,0],0] on node pnod0330 >> Remote daemon: [[58323,0],1] on node pnod0331 >> >> This is usually due to either a failure of the TCP network >> connection to the node, or possibly an internal failure of >> the daemon itself. We cannot recover from this failure, and >> therefore will terminate the job. >> -- >> [pnod0330:110442] [[58323,0],0] orted_cmd: received halt_vm cmd >> [pnod0332:56161] [[58323,0],2] orted_cmd: received halt_vm cmd >> ``` >> >> I haven't yet tried the hard reboot case with ULFM (these nodes take >> forever to come back up), but earlier experiments SIGKILLing the orted >> on a compute node led to a very similar message as above, so at this >> point I'm not optimistic... >> >> I think my next step is to try with several separate mpiruns and use >> mpi_comm_{connect,accept} to plumb everything together before the >> application starts. I notice this is the subject of some recent work >> on ompi master. Even though the mpiruns will all be associated to the >> same ompi-server, do you think this could be sufficient to isolate the >> failures? >> >> Cheers, >> Tim >> >> >> >> On 10 June 2017 at 00:56, r...@open-mpi.org wrote: >>> It has been awhile since I tested it, but I believe the --enable-recovery >>> option might do what you want. >>> >>>> On Jun 8, 2017, at 6:17 AM, Tim Burgess wrote: >>>> >>>> Hi! >>>> >>>> So I know from searching the archive that this is a repeated topic of >>>> discussion here, and apologies for that, but since it's been a year or >>>> so I thought I'd double-check whether anything has changed before >>>> really starting to tear my hair out too much. >>>> >>>> Is there a combination of MCA parameters or similar that will prevent >>>
Re: [OMPI users] Help on the big picture..
On 7/22/2010 4:11 PM, Gus Correa wrote: Hi Cristobal Cristobal Navarro wrote: yes, i was aware of the big difference hehe. now that openMP and openMPI is in talk, i've alwyas wondered if its a good idea to model a solution on the following way, using both openMP and openMPI. suppose you have n nodes, each node has a quadcore, (so you have n*4 processors) launch n proceses acorrding to the n nodes available. set a resource manager like SGE to fill the n*4 slots using round robin. on each process, make use of the other cores available on the node, with openMP. if this is possible, then on each one could make use fo the shared memory model locally at each node, evading unnecesary I/O through the nwetwork, what do you think? Before asking what we think about this, please check the many references posted on this subject over the last decade. Then refine your question to what you are interested in hearing about; evidently you have no interest in much of this topic. Yes, it is possible, and many of the atmosphere/oceans/climate codes that we run is written with this capability. In other areas of science and engineering this is probably the case too. However, this is not necessarily better/faster/simpler than dedicate all the cores to MPI processes. In my view, this is due to: 1) OpenMP has a different scope than MPI, and to some extent is limited by more stringent requirements than MPI; 2) Most modern MPI implementations (and OpenMPI is an example) use shared memory mechanisms to communicate between processes that reside in a single physical node/computer; The shared memory communication of several MPI implementations does greatly improve efficiency of message passing among ranks assigned to the same node. However, these ranks also communicate with ranks on other nodes, so there is a large potential advantage for hybrid MPI/OpenMP as the number of cores in use increases. If you aren't interested in running on more than 8 nodes or so, perhaps you won't care about this. 3) Writing hybrid code with MPI and OpenMP requires more effort, and much care so as not to let the two forms of parallelism step on each other's toes. The MPI standard specifies the use of MPI_init_thread to indicate which combination of MPI and threading you intend to use, and to inquire whether that model is supported by the active MPI. In the case where there is only 1 MPI process per node (possibly using several cores via OpenMP threading) there is no requirement for special affinity support. If there is more than 1 FUNNELED rank per multiple CPU node, it becomes important to maintain cache locality for each rank. OpenMP operates mostly through compiler directives/pragmas interspersed on the code. For instance, you can parallelize inner loops in no time, granted that there are no data dependencies across the commands within the loop. All it takes is to write one or two directive/pragma lines. More than loop parallelization can be done with OpenMP, of course, although not as much as can be done with MPI. Still, with OpenMP, you are restricted to work in a shared memory environment. By contrast, MPI requires more effort to program, but it takes advantage of shared memory and networked environments (and perhaps extended grids too). snipped tons of stuff rather than attempt to reconcile top postings -- Tim Prince
Re: [OMPI users] OpenMPI Run-Time "Freedom" Question
On 8/12/2010 3:27 PM, Ralph Castain wrote: Ick - talk about confusing! I suppose there must be -some- rational reason why someone would want to do this, but I can't imagine what it would be I'm no expert on compiler vs lib confusion, but some of my own experience would say that this is a bad idea regardless of whether or not OMPI is involved. Compiler version interoperability is usually questionable, depending upon how far apart the rev levels are. Only answer I can offer is that you would have to try it. It will undoubtedly be a case-by-case basis: some combinations might work, others might fail. On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote: Hello OpenMPI, we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, and one user has the following request: Can we build OpenMPI version say O.1 against Intel compilers version say I.1 but then built an application with OpenMPI O.1 BUT then use a DIFFERENT Intel compiler version say I.2 to built and run this MPI application? I suggested to him to 1) simply try to built and run the application with O.1 but use Intel compilers version I.X whatever this X is and see if it has any issues. OR 2) If the above does not work, I would build OpenMPI O.1 against Intel version I.X so he can use THIS combination for his hypothetical application. He insists that I build OpenMPI O.1 with some version of Intel compilers I.Y but then at run time he would like to use *different* Intel run time libs at will I.Z <> I.X. Can you provide me with a suggestion for a sane solution to this ? :-) Best regards Michael Guessing at what is meant here, if you build MPI with a given version of Intel compilers, it ought to work when the application is built with a similar or more recent Intel compiler, or when the run-time LD_LIBRARY_PATH refers to a similar or newer library (within reason). There are similar constraints on glibc version. "Within reason" works over a more restricted range when C++ is involved. Note that the Intel linux compilers link to the gcc and glibc libraries as well as those which come with the compiler, and the MPI could be built with a combination of gcc and ifort to work with icc or gcc and ifort. gfortran and ifort libraries, however, are incompatible, except that libgomp calls can be supported by libiomp5. The "rational" use I can see is that an application programmer would likely wish to test a range of compilers without rebuilding MPI. Intel documentation says there is forward compatibility testing of libraries, at least to the extent that a build made with 10.1 would work with 11.1 libraries. The most recent Intel library compatibility break was between MKL 9 and 10. -- Tim Prince
Re: [OMPI users] OpenMPI Run-Time "Freedom" Question
On 8/12/2010 6:04 PM, Michael E. Thomadakis wrote: On 08/12/10 18:59, Tim Prince wrote: On 8/12/2010 3:27 PM, Ralph Castain wrote: Ick - talk about confusing! I suppose there must be -some- rational reason why someone would want to do this, but I can't imagine what it would be I'm no expert on compiler vs lib confusion, but some of my own experience would say that this is a bad idea regardless of whether or not OMPI is involved. Compiler version interoperability is usually questionable, depending upon how far apart the rev levels are. Only answer I can offer is that you would have to try it. It will undoubtedly be a case-by-case basis: some combinations might work, others might fail. On Aug 12, 2010, at 3:53 PM, Michael E. Thomadakis wrote: Hello OpenMPI, we have deployed OpenMPI 1.4.1 and 1.4.2 on our Intel Nehalem cluster using Intel compilers V 11.1.059 and 11.1.072 respectively, and one user has the following request: Can we build OpenMPI version say O.1 against Intel compilers version say I.1 but then built an application with OpenMPI O.1 BUT then use a DIFFERENT Intel compiler version say I.2 to built and run this MPI application? I suggested to him to 1) simply try to built and run the application with O.1 but use Intel compilers version I.X whatever this X is and see if it has any issues. OR 2) If the above does not work, I would build OpenMPI O.1 against Intel version I.X so he can use THIS combination for his hypothetical application. He insists that I build OpenMPI O.1 with some version of Intel compilers I.Y but then at run time he would like to use *different* Intel run time libs at will I.Z <> I.X. Can you provide me with a suggestion for a sane solution to this ? :-) Best regards Michael Guessing at what is meant here, if you build MPI with a given version of Intel compilers, it ought to work when the application is built with a similar or more recent Intel compiler, or when the run-time LD_LIBRARY_PATH refers to a similar or newer library (within reason). There are similar constraints on glibc version. "Within reason" works over a more restricted range when C++ is involved. Note that the Intel linux compilers link to the gcc and glibc libraries as well as those which come with the compiler, and the MPI could be built with a combination of gcc and ifort to work with icc or gcc and ifort. gfortran and ifort libraries, however, are incompatible, except that libgomp calls can be supported by libiomp5. The "rational" use I can see is that an application programmer would likely wish to test a range of compilers without rebuilding MPI. Intel documentation says there is forward compatibility testing of libraries, at least to the extent that a build made with 10.1 would work with 11.1 libraries. The most recent Intel library compatibility break was between MKL 9 and 10. Dear Tim, I offered to provide myself the combination of OMPI+ Intel compilers so that application can use it in stable fashion. When I inquired about this application so I can look into this I was told that "there is NO application yet (!) that fails but just in case it fails ..." I was asked to hack into the OMPI building process to let OMPI use one run-time but then the MPI application using this OMPI ... use another! Thanks for the information on this. We indeed use Intel Compiler set 11.1.XXX + OMPI 1.4.1 and 1.4.2. The basic motive in this hypothetical situation is to build the MPI application ONCE and then swap run-time libs as newer compilers come out I am certain that even if one can get away with it with nearby run-time versions there is no guarantee of the stability at-infinitum. I end up having to spent more time for technically "awkward" requests than the reasonable ones. Reminds me when I was a teacher I had to spent more time with all the people trying to avoid doing the work than with the good students... hmmm :-) According to my understanding, your application (or MPI) built with an Intel 11.1 compiler should continue working with future Intel 11.1 and 12.x libraries. I don't expect Intel to test or support this compatibility beyond that. You will likely want to upgrade your OpenMPI earlier than the time when Intel compiler changes require a new MPI build. If the interest is in getting performance benefits of future hardware simply by installing new dynamic libraries without rebuilding an application, Intel MKL is the most likely favorable scenario. The MKL with optimizations for AVX is already in beta test, and should work as a direct replacement for the MKL in current releases. -- Tim Prince
Re: [OMPI users] send and receive buffer the same on root
On 9/16/2010 9:58 AM, David Zhang wrote: It's compiler specific I think. I've done this with OpenMPI no problem, however on one another cluster with ifort I've gotten error messages about not using MPI_IN_PLACE. So I think if it compiles, it should work fine. On Thu, Sep 16, 2010 at 10:01 AM, Tom Rosmond <mailto:rosm...@reachone.com>> wrote: I am working with a Fortran 90 code with many MPI calls like this: call mpi_gatherv(x,nsize(rank+1), mpi_real,x,nsize,nstep,mpi_real,root,mpi_comm_world,mstat) Compiler can't affect what happens here (unless maybe you use x again somewhere). Maybe you mean MPI library? Intel MPI probably checks this at run time and issues an error. I've dealt with run-time errors (which surfaced along with an ifort upgrade) which caused silent failure (incorrect numerics) on openmpi but a fatal diagnostic from Intel MPI run-time, due to multiple uses of the same buffer.Moral: even if it works for you now with openmpi, you could be setting up for unexpected failure in the future. -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. If you mean something other than this, explanation is needed as part of your question. taskset() or numactl() might be relevant, if you require more detailed control. -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set processory affinity without memory affinity? This is my question.. 2010/9/27 Tim Prince On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. Memory affinity is implied by processor affinity. Your system libraries are set up so as to cause any memory allocated to be made local to the processor, if possible. That's one of the primary benefits of processor affinity. Not being an expert in openmpi, I assume, in the absence of further easily accessible documentation, there's no useful explicit way to disable maffinity while using paffinity on platforms other than the specified legacy platforms. -- Tim Prince
Re: [OMPI users] Memory affinity
On 9/27/2010 2:50 PM, David Singleton wrote: On 09/28/2010 06:52 AM, Tim Prince wrote: On 9/27/2010 12:21 PM, Gabriele Fatigati wrote: HI Tim, I have read that link, but I haven't understood if enabling processor affinity are enabled also memory affinity because is written that: "Note that memory affinity support is enabled only when processor affinity is enabled" Can i set processory affinity without memory affinity? This is my question.. 2010/9/27 Tim Prince On 9/27/2010 9:01 AM, Gabriele Fatigati wrote: if OpenMPI is numa-compiled, memory affinity is enabled by default? Because I didn't find memory affinity alone ( similar) parameter to set at 1. The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity has a useful introduction to affinity. It's available in a default build, but not enabled by default. Memory affinity is implied by processor affinity. Your system libraries are set up so as to cause any memory allocated to be made local to the processor, if possible. That's one of the primary benefits of processor affinity. Not being an expert in openmpi, I assume, in the absence of further easily accessible documentation, there's no useful explicit way to disable maffinity while using paffinity on platforms other than the specified legacy platforms. Memory allocation policy really needs to be independent of processor binding policy. The default memory policy (memory affinity) of "attempt to allocate to the NUMA node of the cpu that made the allocation request but fallback as needed" is flawed in a number of situations. This is true even when MPI jobs are given dedicated access to processors. A common one is where the local NUMA node is full of pagecache pages (from the checkpoint of the last job to complete). For those sites that support suspend/resume based scheduling, NUMA nodes will generally contain pages from suspended jobs. Ideally, the new (suspending) job should suffer a little bit of paging overhead (pushing out the suspended job) to get ideal memory placement for the next 6 or whatever hours of execution. An mbind (MPOL_BIND) policy of binding to the one local NUMA node will not work in the case of one process requiring more memory than that local NUMA node. One scenario is a master-slave where you might want: master (rank 0) bound to processor 0 but not memory bound slave (rank i) bound to processor i and memory bound to the local memory of processor i. They really are independent requirements. Cheers, David ___ interesting; I agree with those of your points on which I have enough experience to have an opinion. However, the original question was not whether it would be desirable to have independent memory affinity, but whether it is possible currently within openmpi to avoid memory placements being influenced by processor affinity. I have seen the case you mention, where performance of a long job suffers because the state of memory from a previous job results in an abnormal number of allocations falling over to other NUMA nodes, but I don't know the practical solution. -- Tim Prince
Re: [OMPI users] hdf5 build error using openmpi and Intel Fortran
On 10/6/2010 12:09 AM, Götz Waschk wrote: libtool: link: mpif90 -shared .libs/H5f90global.o .libs/H5fortran_types.o .libs/H5_ff.o .libs/H5Aff.o .libs/H5Dff.o .libs/H5Eff.o .libs/H5Fff.o .libs/H5Gff.o .libs/H5Iff.o .libs/H5Lff.o .libs/H5Off.o .libs/H5Pff.o .libs/H5Rff.o .libs/H5Sff.o .libs/H5Tff.o .libs/H5Zff.o .libs/H5_DBLE_InterfaceInclude.o .libs/H5f90kit.o .libs/H5_f.o .libs/H5Af.o .libs/H5Df.o .libs/H5Ef.o .libs/H5Ff.o .libs/H5Gf.o .libs/H5If.o .libs/H5Lf.o .libs/H5Of.o .libs/H5Pf.o .libs/H5Rf.o .libs/H5Sf.o .libs/H5Tf.o .libs/H5Zf.o .libs/H5FDmpiof.o .libs/HDF5mpio.o .libs/H5FDmpioff.o-lmpi -lsz -lz -lm -m64 -mtune=generic -rpath=/usr/lib64/openmpi/1.4-icc/lib -soname libhdf5_fortran.so.6 -o .libs/libhdf5_fortran.so.6.0.4 ifort: command line warning #10156: ignoring option '-r'; no argument required ifort: command line warning #10156: ignoring option '-s'; no argument required ld: libhdf5_fortran.so.6: No such file: No such file or directory Do -Wl,-rpath and -Wl,-soname= work any better? -- Tim Prince
Re: [OMPI users] link problem on 64bit platform
On 11/1/2010 5:24 AM, Jeff Squyres wrote: On Nov 1, 2010, at 5:20 AM, jody wrote: jody@aim-squid_0 ~/progs $ mpiCC -g -o HelloMPI HelloMPI.cpp /usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /opt/openmpi-1.4.2/lib/libmpi_cxx.so when searching for -lmpi_cxx This is the key message -- it found libmpi_cxx.so, but the linker deemed it incompatible, so it skipped it. Typically, it means that the cited library is a 32-bit one, to which the 64-bit ld will react in this way. You could have verified this by file /opt/openmpi-1.4.2/lib/* By normal linux conventions a directory named /lib/ as opposed to /lib64/ would contain only 32-bit libraries. If gentoo doesn't conform with those conventions, maybe you should do your learning on a distro which does. -- Tim Prince
Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits
On 11/29/2010 11:31 AM, Gus Correa wrote: Hi Mauricio Check if you have icc (in the Intel compiler bin directory/subdirectories). Check also if it is in your PATH environment variable. "which icc" will tell. If not, add it to PATH. Actually, the right way to do it is to run the Intel scripts to set the whole compiler environment, not only PATH. The scripts should be called something like iccvars.csh iccvars.sh for C/C++ and ifortvars.csh ifortvars.sh for Fortran, and are also in the Intel bin directory. You can source these scripts in your .cshrc/.bashrc file, using the correct shell (.sh if you use [ba]sh, .csh if you use [t]csh). This is in the Intel compiler documentation, take a look. For the icc version mentioned, there is a compilervars.[c]sh which takes care of both C++ and Fortran (if present), as do either of the iccvars or ifortvars, when the compilers are installed in the same directory. Also, you can compile OpenMPI with gcc,g++ and gfortran, if you want. If they are not yet installed in your Ubuntu, you can get them with apt-get, or whatever Ubuntu uses to get packages. icc ought to work interchangeably with gcc, provided the same g++ version is always on PATH. icc doesn't work without the g++. Thus, it is entirely reasonable to build openmpi with gcc and use either gcc or icc to build the application. gfortran and ifort, however, involve incompatible run-time libraries, and the openmpi fortran libraries won't be interchangeable. You must take care not to mix 32- and 64-bit compilers/libraries. Normally you would build everything 64-bit, both openmpi and the application. Ubuntu doesn't follow the standard scheme for location of 32-bit vs. 64-bit compilers and libraries, but the Intel compiler version you mentioned should resolve this automatically. -- Tim Prince
Re: [OMPI users] Help!!!!!!!!!!!!Openmpi instal for ubuntu 64 bits
On 11/29/2010 3:03 PM, Gus Correa wrote: Jeff Squyres wrote: 1- ./configure FC=ifort F77=ifort CC=icc CXX=icpc 2-make all 3 sudo make install all os passos 1 e 2 operam normalmente, mas quando uso o comando make install aparece o erro que nao consigo solucionar. You say only step 3 above fails. You could try "sudo -E make install". I take it that sudo -E should copy over the environment variable settings. I haven't been able to find any documentation of this option, and I don't currently have an Ubuntu installation to check it. Not being aware of such an option, I used to do: sudo source .. compilervars.sh make install -- Tim Prince
Re: [OMPI users] Scalability issue
On 12/5/2010 3:22 PM, Gustavo Correa wrote: I would just rebuild OpenMPI withOUT the compiler flags that change the standard sizes of "int" and "float" (do a "make cleandist" first!), then recompile your program, and see how it goes. I don't think you are gaining anything by trying to change the standard "int/integer" and "real/float" sizdes, and most likely they are inviting trouble, making things more confusing. Worst scenario, you will at least be sure that the bug is somewhere else, not on the mismatch of basic type sizes. If you need to pass 8-byte real buffers, use MPI_DOUBLE_PRECISION, or MPI_REAL8 in your (Fortran) MPI calls, and declare them in the Fortran code accordingly (double precision or real(kind=8)). If I remember right, there is no 8-byte integer support in the Fortran MPI bindings, only in the C bindings, but some OpenMPI expert could clarify this. Hence, if you are passing 8-byte integers in your MPI calls this may be also problematic. My colleagues routinely use 8-byte integers with Fortran, but I agree it's not done by changing openmpi build parameters. They do use Fortran compile line options for the application to change the default integer and real to 64-bit. I wasn't aware of any reluctance to use MPI_INTEGER8. -- Tim Prince
Re: [OMPI users] MPI_Send doesn't work if the data >= 2GB
On 12/5/2010 7:13 PM, 孟宪军 wrote: hi, I met a question recently when I tested the MPI_send and MPI_Recv functions. When I run the following codes, the processes hanged and I found there was not data transmission in my network at all. BTW: I finished this test on two X86-64 computers with 16GB memory and installed Linux. 1 #include 2 #include 3 #include 4 #include 5 6 7 int main(int argc, char** argv) 8 { 9 int localID; 10 int numOfPros; 11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024; 12 13 char* g = (char*)malloc(Gsize); 14 15 MPI_Init(&argc, &argv); 16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros); 17 MPI_Comm_rank(MPI_COMM_WORLD, &localID); 18 19 MPI_Datatype MPI_Type_lkchar; 20 MPI_Type_contiguous(2048, MPI_BYTE, &MPI_Type_lkchar); 21 MPI_Type_commit(&MPI_Type_lkchar); 22 23 if (localID == 0) 24 { 25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1, MPI_COMM_WORLD); 26 } 27 28 if (localID != 0) 29 { 30 MPI_Status status; 31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \ 32 MPI_COMM_WORLD, &status); 33 } 34 35 MPI_Finalize(); 36 37 return 0; 38 } You supplied all your constants as 32-bit signed data, so, even if the count for MPI_Send() and MPI_Recv() were a larger data type, you would see this limit. Did you look at your ? -- Tim Prince
Re: [OMPI users] meaning of MPI_THREAD_*
On 12/6/2010 3:16 AM, Hicham Mouline wrote: Hello, 1. MPI_THREAD_SINGLE: Only one thread will execute. Does this really mean the process cannot have any other threads at all, even if they doen't deal with MPI at all? I'm curious as to how this case affects the openmpi implementation? Essentially, what is the difference between MPI_THREAD_SINGLE and MPI_THREAD_FUNNELED? 2. In my case, I'm interested in MPI_THREAD_SERIALIZED. However if it's available, I can use MPI_THREAD_FUNNELED. What cmake flags do I need to enable to allow this mode? 3. Assume I assign only 1 thread in my program to deal with MPI. What is the difference between int MPI::Init_thread(MPI_THREAD_SINGLE) int MPI::Init_thread(MPI_THREAD_FUNNELED) int MPI::Init() You're question is too broad; perhaps you didn't intend it that way. Are you trying to do something which may work only with a specific version of openmpi, or are you willing to adhere to portable practice? I tend to believe what it says at http://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-2.0/node165.htm including: A call to MPI_INIT has the same effect as a call to MPI_INIT_THREAD with a required = MPI_THREAD_SINGLE You would likely use one of those if all your MPI calls are from a single thread, and you don't perform any threading inside MPI. MPI implementations vary on the extent to which a higher level of threading than what is declared can be used successfully (there's no guarantee of bad results if you exceed what was set by MPI_INIT). There shouldn't be any bad effect from setting a higher level of thread support which you never use. I would think your question about cmake flags would apply only once you chose a compiler. I have never seen anyone try mixing auto-parallelization with MPI; that would require MPI_THREAD_MULTIPLE but still appears unpredictable. MPI_THREAD_FUNNELED is used often with OpenMP parallelization inside MPI. -- Tim Prince
Re: [OMPI users] Mac Ifort and gfortran together
On 12/15/2010 8:22 PM, Jeff Squyres wrote: Sorry for the ginormous delay in replying here; I blame SC'10, Thanksgiving, and the MPI Forum meeting last week... On Nov 29, 2010, at 2:12 PM, David Robertson wrote: I'm noticing a strange problem with Open MPI 1.4.2 on Mac OS X 10.6. We use both Intel Ifort 11.1 and gfortran 4.3 on the same machine and switch between them to test and debug code. I had runtime problems when I compiled openmpi in my usual way of no shared libraries so I switched to shared and it runs now. What problems did you have? OMPI should work fine when compiled statically. However, in order for it to work with ifort I ended up needing to add the location of my intel compiled Open MPI libraries (/opt/intelsoft/openmpi/lib) to my DYLD_LIBRARY_PATH environment variable to to get codes to compile and/or run with ifort. Is this what Intel recommends for anything compiled with ifort on OS X, or is this unique to OMPI-compiled MPI applications? The problem is that adding /opt/intelsoft/openmpi/lib to DYLD_LIBRARY_PATH broke my Open MPI for gfortran. Now when I try to compile with mpif90 for gfortran it thinks it's actually trying to compile with ifort still. As soon as I take the above path out of DYLD_LIBRARY_PATH everything works fine. Also, when I run ompi_info everything looks right except prefix. It says /opt/intelsoft/openmpi rather than /opt/gfortransoft/openmpi like it should. It should be noted that having /opt/intelsoft/openmpi in LD_LIBRARY_PATH does not produce the same effect. I'm not quite clear on your setup, but it *sounds* like you're somehow mixing up 2 different installations of OMPI -- one in /opt/intelsoft and the other in /opt/gfortransoft. Can you verify that you're using the "right" mpif77 (and friends) when you intend to, and so on? Well, yes, he has to use the MPI Fortran libraries compiled by ifort with his ifort application build, and the ones compiled by gfortran with a gfortran application build. There's nothing "strange" about it; the PATH for mpif90 and DYLD_LIBRARY_PATH for the Fortran library have to be set correctly for each case. If linking statically with the MPI Fortran library, you still must choose the one built with the compatible Fortran. gfortran and ifort can share C run-time libraries but not the Fortran ones. It's the same as on linux (and, likely, Windows). -- Tim Prince
Re: [OMPI users] Call to MPI_Test has large time-jitter
On 12/17/2010 6:43 PM, Sashi Balasingam wrote: Hi, I recently started on an MPI-based, 'real-time', pipelined-processing application, and the application fails due to large time-jitter in sending and receiving messages. Here are related info - 1) Platform: a) Intel Box: Two Hex-core, Intel Xeon, 2.668 GHz (...total of 12 cores), b) OS: SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l) c) MPI Rev: (OpenRTE) 1.4, (...Installed OFED package) d) HCA: InfiniBand: Mellanox Technologies MT26428 [ConnectX IB QDR, PCIe 2.0 5GT/s] (rev a0) 2) Application detail a) Launching 7 processes, for pipelined processing, where each process waits for a message (sizes vary between 1 KBytes to 26 KBytes), then process the data, and outputs a message (sizes vary between 1 KBytes to 26 KBytes), to next process. b) MPI transport functions used : "MPI_Isend", MPI_Irecv, MPI_Test. i) For Receiving messages, I first make an MPI_Irecv call, followed by a busy-loop on MPI_Test, waiting for message ii) For Sending message, there is a busy-loop on MPI_Test to ensure prior buffer was sent, then use MPI_Isend. c) When the job starts, all these 7 process are put in High priority mode ( SCHED_FIFO policy, with priority setting of 99). The Job entails an input data packet stream (and a series of MPI messages), continually at 40 micro-sec rate, for a few minutes. 3) The Problem: Most calls to MPI_Test (...which is non-blocking) takes a few micro-sec, but around 10% of the job, it has a large jitter, that vary from 1 to 100 odd millisec. This causes some of the application input queues to fill-up and cause a failure. Any suggestions to look at on the MPI settings or OS config/issues will be much appreciated. I didn't see anything there about your -mca affinity settings. Even if the defaults don't choose optimum mapping, it's way better than allowing them to float as you would with multiple independent jobs running. -- Tim Prince
Re: [OMPI users] Running OpenMPI on SGI Altix with 4096 cores : very poor performance
On 1/7/2011 6:49 AM, Jeff Squyres wrote: My understanding is that hyperthreading can only be activated/deactivated at boot time -- once the core resources are allocated to hyperthreads, they can't be changed while running. Whether disabling the hyperthreads or simply telling Linux not to schedule on them makes a difference performance-wise remains to be seen. I've never had the time to do a little benchmarking to quantify the difference. If someone could rustle up a few cycles (get it?) to test out what the real-world performance difference is between disabling hyperthreading in the BIOS vs. telling Linux to ignore the hyperthreads, that would be awesome. I'd love to see such results. My personal guess is that the difference is in the noise. But that's a guess. Applications which depend on availability of full size instruction lookaside buffer would be candidates for better performance with hyperthreads completely disabled. Many HPC applications don't stress ITLB, but some do. Most of the important resources are allocated dynamically between threads, but the ITLB is an exception. We reported results of an investigation on Intel Nehalem 4-core hyperthreading where geometric mean performance of standard benchmarks for certain commercial applications was 2% better with hyperthreading disabled at boot time, compared with best 1 rank per core scheduling with hyperthreading enabled. Needless to say, the report wasn't popular with marketing. I haven't seen an equivalent investigation for the 6-core CPUs, where various strange performance effects have been noted, so, as Jeff said, the hyperthreading effect could be "in the noise." -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/22/2011 1:41 PM, Prentice Bisbal wrote: One of the researchers I support is writing some Fortran code that uses Open MPI. The code is being compiled with the Intel Fortran compiler. This one line of code: integer ierr,istatus(MPI_STATUS_SIZE) leads to these errors: $ mpif90 -o simplex simplexmain579m.for simplexsubs579 /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88): error #6406: Conflicting attributes or multiple declaration of name. [MPI_STATUS_SIZE] parameter (MPI_STATUS_SIZE=5) -^ simplexmain579m.for(147): error #6591: An automatic object is invalid in a main program. [ISTATUS] integer ierr,istatus(MPI_STATUS_SIZE) -^ simplexmain579m.for(147): error #6219: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association [MPI_STATUS_SIZE] integer ierr,istatus(MPI_STATUS_SIZE) -^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6756: A COMMON block data object must not be an automatic object. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) --^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6591: An automatic object is invalid in a main program. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) Any idea how to fix this? Is this a bug in the Intel compiler, or the code? I can't see the code from here. The first failure to recognize the PARAMETER definition apparently gives rise to the others. According to the message, you already used the name MPI_STATUS_SIZE in mpif-config.h and now you are trying to give it another usage (not case sensitive) in the same scope. If so, it seems good that the compiler catches it. -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/23/2011 6:41 AM, Prentice Bisbal wrote: Tim Prince wrote: On 2/22/2011 1:41 PM, Prentice Bisbal wrote: One of the researchers I support is writing some Fortran code that uses Open MPI. The code is being compiled with the Intel Fortran compiler. This one line of code: integer ierr,istatus(MPI_STATUS_SIZE) leads to these errors: $ mpif90 -o simplex simplexmain579m.for simplexsubs579 /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-config.h(88): error #6406: Conflicting attributes or multiple declaration of name. [MPI_STATUS_SIZE] parameter (MPI_STATUS_SIZE=5) -^ simplexmain579m.for(147): error #6591: An automatic object is invalid in a main program. [ISTATUS] integer ierr,istatus(MPI_STATUS_SIZE) -^ simplexmain579m.for(147): error #6219: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association [MPI_STATUS_SIZE] integer ierr,istatus(MPI_STATUS_SIZE) -^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6756: A COMMON block data object must not be an automatic object. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) --^ /usr/local/openmpi-1.2.8/intel-11/x86_64/include/mpif-common.h(211): error #6591: An automatic object is invalid in a main program. [MPI_STATUS_IGNORE] integer MPI_STATUS_IGNORE(MPI_STATUS_SIZE) Any idea how to fix this? Is this a bug in the Intel compiler, or the code? I can't see the code from here. The first failure to recognize the PARAMETER definition apparently gives rise to the others. According to the message, you already used the name MPI_STATUS_SIZE in mpif-config.h and now you are trying to give it another usage (not case sensitive) in the same scope. If so, it seems good that the compiler catches it. I agree with your logic, but the problem is where the code containing the error is coming from - it's comping from a header files that's a part of Open MPI, which makes me think this is a cmpiler error, since I'm sure there are plenty of people using the same header file. in their code. Are you certain that they all find it necessary to re-define identifiers from that header file, rather than picking parameter names which don't conflict? -- Tim Prince
Re: [OMPI users] What's wrong with this code?
On 2/23/2011 8:27 AM, Prentice Bisbal wrote: Jeff Squyres wrote: On Feb 23, 2011, at 9:48 AM, Tim Prince wrote: I agree with your logic, but the problem is where the code containing the error is coming from - it's comping from a header files that's a part of Open MPI, which makes me think this is a cmpiler error, since I'm sure there are plenty of people using the same header file. in their code. Are you certain that they all find it necessary to re-define identifiers from that header file, rather than picking parameter names which don't conflict? Without seeing the code, it sounds like Tim might be right: someone is trying to re-define the MPI_STATUS_SIZE parameter that is being defined by OMPI's mpif-config.h header file. Regardless of include file/initialization ordering (i.e., regardless of whether mpif-config.h is the first or Nth entity to try to set this parameter), user code should never set this parameter value. Or any symbol that begins with MPI_, for that matter. The entire "MPI_" namespace is reserved for MPI. I understand that, and I checked the code to make sure the programmer didn't do anything stupid like that. The entire code is only a few hundred lines in two different files. In the entire program, there is only 1 include statement: include 'mpif.h' and MPI_STATUS_SIZE appears only once: integer ierr,istatus(MPI_STATUS_SIZE) I have limited knowledge of Fortran programming, but based on this, I don't see how MPI_STATUS_SIZE could be getting overwritten. Earlier, you showed a preceding PARAMETER declaration setting a new value for that name, which would be required to make use of it in this context. Apparently, you intend to support only compilers which violate the Fortran standard by supporting a separate name space for PARAMETER identifiers, so that you can violate the MPI standard by using MPI_ identifiers in a manner which I believe is called shadowing in C. -- Tim Prince
Re: [OMPI users] Open MPI access the same file in parallel ?
On 3/9/2011 8:57 PM, David Zhang wrote: Under my programming environment, FORTRAN, it is possible to parallel read (using native read function instead of MPI's parallel read function). Although you'll run into problem when you try to parallel write to the same file. If your Fortran compiler/library are reasonably up to date, you will need to specify action='read' as opening once with default readwrite will lock out other processes. -- Tim Prince
Re: [OMPI users] Open MPI access the same file in parallel ?
On 3/9/2011 11:05 PM, Jack Bryan wrote: thanks I am using GNU mpic++ compiler. Does it can automatically support accessing a file by many parallel processes ? It should follow the gcc manual, e.g. http://www.gnu.org/s/libc/manual/html_node/Opening-Streams.html I think you want *opentype to evaluate to 'r' (readonly). -- Tim Prince
Re: [OMPI users] intel compiler linking issue and issue of environment variable on remote node, with open mpi 1.4.3
On 3/21/2011 5:21 AM, ya...@adina.com wrote: I am trying to compile our codes with open mpi 1.4.3, by intel compilers 8.1. (1) For open mpi 1.4.3 installation on linux beowulf cluster, I use: ./configure --prefix=/home/yiguang/dmp-setup/openmpi-1.4.3 CC=icc CXX=icpc F77=ifort FC=ifort --enable-static LDFLAGS="-i-static - static-libcxa" --with-wrapper-ldflags="-i-static -static-libcxa" 2>&1 | tee config.log and make all install 2>&1 | tee install.log The issue is that I am trying to build open mpi 1.4.3 with intel compiler libraries statically linked to it, so that when we run mpirun/orterun, it does not need to dynamically load any intel libraries. But what I got is mpirun always asks for some intel library(e.g. libsvml.so) if I do not put intel library path on library search path($LD_LIBRARY_PATH). I checked the open mpi user archive, it seems only some kind user mentioned to use "-i-static"(in my case) or "-static-intel" in ldflags, this is what I did, but it seems not working, and I did not get any confirmation whether or not this works for anyone else from the user archive. could anyone help me on this? thanks! If you are to use such an ancient compiler (apparently a 32-bit one), you must read the docs which come with it, rather than relying on comments about a more recent version. libsvml isn't included automatically at link time by that 32-bit compiler, unless you specify an SSE option, such as -xW. It's likely that no one has verified OpenMPI with a compiler of that vintage. We never used the 32-bit compiler for MPI, and we encountered run-time library bugs for the ifort x86_64 which weren't fixed until later versions. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/27/2011 2:26 AM, Michele Marena wrote: Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory and it is in the local filesystem. I guess you mean shared memory message passing. Among relevant parameters may be the message size where your implementation switches from cached copy to non-temporal (if you are on a platform where that terminology is used). If built with Intel compilers, for example, the copy may be performed by intel_fast_memcpy, with a default setting which uses non-temporal when the message exceeds about some preset size, e.g. 50% of smallest L2 cache for that architecture. A quick search for past posts seems to indicate that OpenMPI doesn't itself invoke non-temporal, but there appear to be several useful articles not connected with OpenMPI. In case guesses aren't sufficient, it's often necessary to profile (gprof, oprofile, Vtune, ) to pin this down. If shared message slows your application down, the question is whether this is due to excessive eviction of data from cache; not a simple question, as most recent CPUs have 3 levels of cache, and your application may require more or less data which was in use prior to the message receipt, and may use immediately only a small piece of a large message. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/28/2011 3:44 AM, Jeff Squyres (jsquyres) wrote: Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? In fact, I made the guess that the performance difference under discussion referred to a single node. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/28/2011 3:29 AM, Michele Marena wrote: Each node have two processors (no dual-core). which seems to imply that the 2 processors share memory space and a single memory buss, and the question is not about what I originally guessed. -- Tim Prince
Re: [OMPI users] Shared Memory Performance Problem.
On 3/30/2011 10:08 AM, Eugene Loh wrote: Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is the case processes 1 and 2 are on different nodes (runs in 162s). @--- MPI Time (seconds) --- Task AppTime MPITime MPI% 0 162 162 99.99 1 162 30.2 18.66 2 162 14.7 9.04 * 486 207 42.56 The case when processes 1 and 2 are on the same node (runs in 260s). @--- MPI Time (seconds) --- Task AppTime MPITime MPI% 0 260 260 99.99 1 260 39.7 15.29 2 260 26.4 10.17 * 779 326 41.82 I think there's a contention problem on the memory bus. Right. Process 0 spends all its time in MPI, presumably waiting on workers. The workers spend about the same amount of time on MPI regardless of whether they're placed together or not. The big difference is that the workers are much slower in non-MPI tasks when they're located on the same node. The issue has little to do with MPI. The workers are hogging local resources and work faster when placed on different nodes. However, the message size is 4096 * sizeof(double). Maybe I are wrong in this point. Is the message size too huge for shared memory? No. That's not very large at all. Not even large enough to expect the non-temporal storage issue about cache eviction to arise. -- Tim Prince
Re: [OMPI users] OMPI monitor each process behavior
On 4/12/2011 8:55 PM, Jack Bryan wrote: I need to monitor the memory usage of each parallel process on a linux Open MPI cluster. But, top, ps command cannot help here because they only show the head node information. I need to follow the behavior of each process on each cluster node. Did you consider ganglia et al? I cannot use ssh to access each node. How can MPI run? The program takes 8 hours to finish. -- Tim Prince
Re: [OMPI users] Problem compiling OpenMPI on Ubuntu 11.04
On 04/19/2011 01:24 PM, Sergiy Bubin wrote: /usr/include/c++/4.5/iomanip(64): error: expected an expression { return { __mask }; } ^ /usr/include/c++/4.5/iomanip(94): error: expected an expression { return { __mask }; } ^ /usr/include/c++/4.5/iomanip(125): error: expected an expression { return { __base }; } ^ /usr/include/c++/4.5/iomanip(193): error: expected an expression { return { __n }; } ^ /usr/include/c++/4.5/iomanip(223): error: expected an expression { return { __n }; } ^ /usr/include/c++/4.5/iomanip(163): error: expected an expression { return { __c }; } ^ If you're using icpc, this seeming incompatibility between icpc and g++ 4.5 has been discussed on the icpc forum http://software.intel.com/en-us/forums/showthread.php?t=78677&wapkw=%28iomanip%29 where you should see that you must take care to set option -std=c++0x when using current under icpc, as it is treated as a c++0x feature. You might try adding the option to the CXXFLAGS or whatever they are called in openmpi build (or to the icpc.cfg in your icpc installation). -- Tim Prince
[OMPI users] Mixing the FORTRAN and C APIs.
Hi, I'm trying to use PARPACK in a C++ app I have written. This is an FORTRAN MPI routine used to calculate SVDs. The simplest way I found to do this is to use f2c to convert it to C, and then call the resulting functions from my C++ code. However PARPACK requires that I write some user-defined operations to be parallel using MPI. So far I have just been calling the FORTRAN versions of the MPI functions from C, because I wasn't sure whether you can mix the APIs. I.e. I've been doing this: -8<- extern "C" { int mpi_init__(integer *); int mpi_comm_rank__(integer *, integer *, integer *); int mpi_comm_size__(integer *, integer *, integer *); int mpi_finalize__(integer *); int mpi_allgatherv__(doublereal *, integer *, integer *, doublereal *, integer *, integer *, integer *, integer *); // OpenMPI version. const integer MPI_DOUBLE_PRECISION = 17; } bool MPI__Init() { integer ierr = 0; mpi_init__(&ierr); return ierr == 0; } 8< It works so far, but is getting quite tedious and seems like the wrong way to do it. Also I don't know if it's related but when I use allgatherv it gives me a segfault: [panic:20659] *** Process received signal *** [panic:20659] Signal: Segmentation fault (11) [panic:20659] Signal code: Address not mapped (1) [panic:20659] Failing at address: 0x7f4effe8 [panic:20659] [ 0] /lib/libc.so.6(+0x33af0) [0x7f4f8fd62af0] [panic:20659] [ 1] /usr/lib/libstdc++.so.6(_ZNSolsEi+0x3) [0x7f4f905ec0c3] [panic:20659] [ 2] ./TDLSM() [0x510322] [panic:20659] [ 3] ./TDLSM() [0x50ec8d] [panic:20659] [ 4] ./TDLSM() [0x404ee7] [panic:20659] [ 5] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f4f8fd4dc4d] [panic:20659] [ 6] ./TDLSM() [0x404c19] [panic:20659] *** End of error message *** So my question is: Can I intermix the C and FORTRAN APIs within one program? Oh and also I think the cluster I will eventually run this on (cx1.hpc.ic.ac.uk, if anyone is from Imperial) doesn't use OpenMP, so what about other MPI implementations? Many thanks, Tim
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 5/6/2011 7:58 AM, Tim Hutt wrote: Hi, I'm trying to use PARPACK in a C++ app I have written. This is an FORTRAN MPI routine used to calculate SVDs. The simplest way I found to do this is to use f2c to convert it to C, and then call the resulting functions from my C++ code. However PARPACK requires that I write some user-defined operations to be parallel using MPI. So far I have just been calling the FORTRAN versions of the MPI functions from C, because I wasn't sure whether you can mix the APIs. I.e. I've been doing this: -8<- extern "C" { int mpi_init__(integer *); int mpi_comm_rank__(integer *, integer *, integer *); int mpi_comm_size__(integer *, integer *, integer *); int mpi_finalize__(integer *); int mpi_allgatherv__(doublereal *, integer *, integer *, doublereal *, integer *, integer *, integer *, integer *); // OpenMPI version. const integer MPI_DOUBLE_PRECISION = 17; } bool MPI__Init() { integer ierr = 0; mpi_init__(&ierr); return ierr == 0; } 8< It works so far, but is getting quite tedious and seems like the wrong way to do it. Also I don't know if it's related but when I use allgatherv it gives me a segfault: [panic:20659] *** Process received signal *** [panic:20659] Signal: Segmentation fault (11) [panic:20659] Signal code: Address not mapped (1) [panic:20659] Failing at address: 0x7f4effe8 [panic:20659] [ 0] /lib/libc.so.6(+0x33af0) [0x7f4f8fd62af0] [panic:20659] [ 1] /usr/lib/libstdc++.so.6(_ZNSolsEi+0x3) [0x7f4f905ec0c3] [panic:20659] [ 2] ./TDLSM() [0x510322] [panic:20659] [ 3] ./TDLSM() [0x50ec8d] [panic:20659] [ 4] ./TDLSM() [0x404ee7] [panic:20659] [ 5] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f4f8fd4dc4d] [panic:20659] [ 6] ./TDLSM() [0x404c19] [panic:20659] *** End of error message *** So my question is: Can I intermix the C and FORTRAN APIs within one program? Oh and also I think the cluster I will eventually run this on (cx1.hpc.ic.ac.uk, if anyone is from Imperial) doesn't use OpenMP, so what about other MPI implementations? If you want to use the MPI Fortran library, don't convert your Fortran to C. It's difficult to understand why you would consider f2c a "simplest way," but at least it should allow you to use ordinary C MPI function calls. The MPI Fortran library must be built against the same Fortran run-time libraries which you use for your own Fortran code. The header files for the Fortran MPI calls probably don't work in C. It would be a big struggle to get them to work with f2c, since f2c doesn't have much ability to deal with headers other than its own. There's no reason you can't make both C and Fortran MPI calls in the same application. If you mean mixing a send from one language with a receive in another, I think most would avoid that. Whether someone uses OpenMP has little to do with choice of MPI implementation. Some of us still may be cursing the choice of OpenMPI for the name of an MPI implementation. -- Tim Prince
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 6 May 2011 16:27, Tim Prince wrote: > If you want to use the MPI Fortran library, don't convert your Fortran to C. > It's difficult to understand why you would consider f2c a "simplest way," > but at least it should allow you to use ordinary C MPI function calls. Sorry, maybe I wasn't clear. Just to clarify, all of *my* code is written in C++ (because I don't actually know Fortran), but I want to use some function from PARPACK which is written in Fortran. I think I originally used f2c because it was a massive pain linking to a Fortran library. I suppose I could give it another go, but I don't think it affects the problem of mixing APIs (since the f2c version still uses the Fortran API). > The MPI Fortran library must be built against the same Fortran run-time > libraries which you use for your own Fortran code. The header files for the > Fortran MPI calls probably don't work in C. It would be a big struggle to > get them to work with f2c, since f2c doesn't have much ability to deal with > headers other than its own. Yeah I've had to manually recreate C versions of the Fortran headers (mpif.h), which is a pain and the main reason I want to try a different method. > There's no reason you can't make both C and Fortran MPI calls in the same > application. If you mean mixing a send from one language with a receive in > another, I think most would avoid that. I'm fairly sure that there wouldn't ever be a send in one language and a receive in another, but I would be doing independent sends/receives with different languages (one after another), something like this: MPI_Init(argc, argv); // C. call mpi_send(..., MPI_DOUBLE_PRECISION, ...); // Fortan. // ... MPI_AllGatherV(..., MPI_DOUBLE, ...); // C, but completely separate from previous communications. MPI_Finalize(); // C. > Whether someone uses OpenMP has little to do with choice of MPI > implementation. Some of us still may be cursing the choice of OpenMPI for > the name of an MPI implementation. Oops, that was a typo - I meant OpenMPI! I'm not actually using OpenMP at all. Thanks for the help! Tim
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 6 May 2011 16:45, Tim Hutt wrote: > On 6 May 2011 16:27, Tim Prince wrote: >> If you want to use the MPI Fortran library, don't convert your Fortran to C. >> It's difficult to understand why you would consider f2c a "simplest way," >> but at least it should allow you to use ordinary C MPI function calls. > > Sorry, maybe I wasn't clear. Just to clarify, all of *my* code is > written in C++ (because I don't actually know Fortran), but I want to > use some function from PARPACK which is written in Fortran. Hmm I converted my C++ code to use the C OpenMPI interface instead, and now I get link errors (undefined references). I remembered I've been linking with -lmpi -lmpi_f77, so maybe I need to also link with -lmpi_cxx or -lmpi++ ... what exactly do each of these libraries contain? Also I have run into the problem that the communicators are of type "MPI_Comm" in C, and "integer" in Fortran... I am using MPI_COMM_WORLD in each case so I assume that will end up referring to the same thing... but maybe you really can't mix Fortran and C. Expert opinion would be very very welcome! Tim
Re: [OMPI users] Mixing the FORTRAN and C APIs.
On 5/6/2011 10:22 AM, Tim Hutt wrote: On 6 May 2011 16:45, Tim Hutt wrote: On 6 May 2011 16:27, Tim Prince wrote: If you want to use the MPI Fortran library, don't convert your Fortran to C. It's difficult to understand why you would consider f2c a "simplest way," but at least it should allow you to use ordinary C MPI function calls. Sorry, maybe I wasn't clear. Just to clarify, all of *my* code is written in C++ (because I don't actually know Fortran), but I want to use some function from PARPACK which is written in Fortran. Hmm I converted my C++ code to use the C OpenMPI interface instead, and now I get link errors (undefined references). I remembered I've been linking with -lmpi -lmpi_f77, so maybe I need to also link with -lmpi_cxx or -lmpi++ ... what exactly do each of these libraries contain? Also I have run into the problem that the communicators are of type "MPI_Comm" in C, and "integer" in Fortran... I am using MPI_COMM_WORLD in each case so I assume that will end up referring to the same thing... but maybe you really can't mix Fortran and C. Expert opinion would be very very welcome! If you use your OpenMPI mpicc wrapper to compile and link, the MPI libraries should be taken care of. Style usage in an f2c translation is debatable, but you have an #include "f2c.h" or "g2c.h" which translates the Fortran data types to legacy C equivalent. By legacy I mean that in the f2c era, the inclusion of C data types in Fortran via USE iso_c_binding had not been envisioned. One would think that you would use the MPI header data types on both the Fortran and the C side, even though you are using legacy interfaces. Slip-ups in MPI data types often lead to run-time errors. If you have an error-checking MPI library such as the Intel MPI one, you get a little better explanation at the failure point. -- Tim Prince
Re: [OMPI users] USE mpi
On 5/7/2011 2:35 PM, Dmitry N. Mikushin wrote: didn't find the icc compiler Jeff, on 1.4.3 I saw the same issue, even more generally: "make install" cannot find the compiler, if it is an alien compiler (i.e. not the default gcc) - same situation for intel or llvm, for example. The workaround is to specify full paths to compilers with CC=... FC=... in ./configure params. Could it be "make install" breaks some env paths? Most likely reason for not finding an installed icc is that the icc environment (source the compilervars script if you have a current version) wasn't set prior to running configure. Setting up the compiler in question in accordance with its own instructions is a more likely solution than the absolute path choice. OpenMPI configure, for good reason, doesn't search your system to see where a compiler might be installed. What if you had 2 versions of the same named compiler? -- Tim Prince
Re: [OMPI users] MPI_COMM_DUP freeze with OpenMPI 1.4.1
On 5/10/2011 6:43 AM, francoise.r...@obs.ujf-grenoble.fr wrote: Hi, I compile a parallel program with OpenMPI 1.4.1 (compiled with intel compilers 12 from composerxe package) . This program is linked to MUMPS library 4.9.2, compiled with the same compilers and link with intel MKL. The OS is linux debian. No error in compiling or running the job, but the program freeze inside a call to "zmumps" routine, when the slaves process call MPI_COMM_DUP routine. The program is executed on 2 nodes of 12 cores each (westmere processors) with the following command : mpirun -np 24 --machinefile $OAR_NODE_FILE -mca plm_rsh_agent "oarsh" --mca btl self,openib -x LD_LIBRARY_PATH ./prog We have 12 process running on each node. We submit the job with OAR batch scheduler (the $OAR_NODE_FILE variable and "oarsh" command are specific to this scheduler and are usually working well with openmpi ) via gdb, on the slaves, we can see that they are blocked in MPI_COMM_DUP : (gdb) where #0 0x2b32c1533113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x00a34b4e in mca_pml_ob1_recv () #5 0x009b0768 in ompi_coll_tuned_allreduce_intra_recursivedoubling () #6 0x009ac829 in ompi_coll_tuned_allreduce_intra_dec_fixed () #7 0x0097e271 in ompi_comm_allreduce_intra () #8 0x0097dd06 in ompi_comm_nextcid () #9 0x0097be01 in ompi_comm_dup () #10 0x009a0785 in PMPI_Comm_dup () #11 0x0097931d in pmpi_comm_dup__ () #12 0x00644251 in zmumps (id=...) at zmumps_part1.F:144 #13 0x004c0d03 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:44 #14 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 the master wait further : (gdb) where #0 0x2b9dc9f3e113 in poll () from /lib/libc.so.6 #1 0x00adf52c in poll_dispatch () #2 0x00adcea3 in opal_event_loop () #3 0x00ad69f9 in opal_progress () #4 0x0098f294 in ompi_request_default_wait_all () #5 0x00a06e56 in ompi_coll_tuned_sendrecv_actual () #6 0x009ab8e3 in ompi_coll_tuned_barrier_intra_bruck () #7 0x009ac926 in ompi_coll_tuned_barrier_intra_dec_fixed () #8 0x009a0b20 in PMPI_Barrier () #9 0x00978c93 in pmpi_barrier__ () #10 0x004c0dc4 in sub_pbdirect_init (id=..., matrix_build=...) at sub_pbdirect_init.f90:62 #11 0x00628706 in fwt2d_elas_v2 () at fwt2d_elas.f90:1048 Remark : The same code compiled and run well with intel MPI library, from the same intel package, on the same nodes. Did you try compiling with equivalent options in each compiler? For example, (supposing you had gcc 4.6) gcc -O3 -funroll-loops --param max-unroll-times=2 -march=corei7 would be equivalent (as closely as I know) to icc -fp-model source -msse4.2 -ansi-alias As you should be aware, default settings in icc are more closely equivalent to gcc -O3 -ffast-math -fno-cx-limited-range -funroll-loops --param max-unroll-times=2 -fnostrict-aliasing The options I suggest as an upper limit are probably more aggressive than most people have used successfully with OpenMPI. As to run-time MPI options, Intel MPI has affinity with Westmere awareness turned on by default. I suppose testing without affinity settings, particularly when banging against all hyperthreads, is a more severe test of your application. Don't you get better results at 1 rank per core? -- Tim Prince
Re: [OMPI users] OpenMPI vs Intel Efficiency question
On 7/12/2011 7:45 PM, Mohan, Ashwin wrote: Hi, I noticed that the exact same code took 50% more time to run on OpenMPI than Intel. I use the following syntax to compile and run: Intel MPI Compiler: (Redhat Fedora Core release 3 (Heidelberg), Kernel version: Linux 2.6.9-1.667smp x86_64** mpiicpc -o .cpp -lmpi OpenMPI 1.4.3: (Centos 5.5 w/ python 2.4.3, Kernel version: Linux 2.6.18-194.el5 x86_64)** mpiCC .cpp -o **Other hardware specs** processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Xeon(TM) CPU 3.60GHz stepping: 4 cpu MHz : 3591.062 cache size : 1024 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 1 apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lmconstant_tsc pni monitor ds_cpl est tm2 cid xtpr bogomips: 7182.12 clflush size: 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: Can the issue of efficiency be deciphered from the above info? Does the compiler flags have an effect on the efficiency of the simulation. If so, what flags maybe useful to check to be included for Open MPI. The default options for icpc are roughly equivalent to the quite aggressive choice g++ -fno-strict-aliasing -ffast-math -fnocx-limited-range -O3 -funroll-loops --param max-unroll-times=2 while you apparently used default -O0 for your mpiCC (if it is g++), neither of which is a very good initial choice for performance analysis. So, if you want a sane comparison but aren't willing to study the compiler manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc -prec-div -prec-sqrt -ansi-alias and at least (if your linux compiler is g++) mpiCC -O2 possibly with some of the other options I mentioned earlier. If you have as ancient a g++ as your indication of FC3 implies, it really isn't fair to compare it with a currently supported compiler. Then, Intel MPI, by default, would avoid using HyperThreading, even though you have it enabled on your CPU, so, I suppose, if you are running on a single core, it will be rotating among your 4 MPI processes 1 at a time. The early Intel HyperThread CPUs typically took 15% longer to run MPI jobs when running 2 processes per core. Will including MPICH2 increase efficiency in running simulations using OpenMPI? You have to choose a single MPI. Having MPICH2 installed shouldn't affect performance of OpenMPI or Intel MPI, except to break your installation if you don't keep things sorted out. OpenMPI and Intel MPI normally perform very close, if using equivalent settings, when working within the environments for which both are suited. -- Tim Prince
Re: [OMPI users] OpenMPI vs Intel Efficiency question
On 7/12/2011 11:06 PM, Mohan, Ashwin wrote: Tim, Thanks for your message. I was however not clear about your suggestions. Would appreciate if you could clarify. You say," So, if you want a sane comparison but aren't willing to study the compiler manuals, you might use (if your source code doesn't violate the aliasing rules) mpiicpc -prec-div -prec-sqrt -ansi-alias and at least (if your linux compiler is g++) mpiCC -O2 possibly with some of the other options I mentioned earlier." ###From your response above, I understand to use, for Intel, this syntax: "mpiicpc -prec-div -prec-sqrt -ansi-alias" and for OPENMPI use "mpiCC -O2". I am not certain about the other options you mention. ###Also, I presently use a hostfile while submitting my mpirun. Each node has four slots and my hostfile was "nodename slots=4". My compile code is mpiCC -o xxx.xpp. If you have as ancient a g++ as your indication of FC3 implies, it really isn't fair to compare it with a currently supported compiler. ###Do you suggest upgrading the current installation of g++? Would that help? How much it would help would depend greatly on your source code. It won't help much anyway if you don't choose appropriate options. Current g++ is nearly as good at auto-vectorization as icpc, unless you dive into the pragmas and cilk stuff provided with icpc. You really need to look at the gcc manual to understand those options; going into it in any more depth here would try the patience of the list. ###How do I ensure that all 4 slots are active when i submit a mpirun -np 4 command. When I do "top", I notice that all 4 slots are active. I noticed this when I did "top" with the Intel machine too, that is, it showed four slots active. Thank you..ashwin. I was having trouble inferring what platform you are running on, I guessed a single core HyperThread, which doesn't seem to agree with your "4 slots" terminology. If you have 2 single core hyperthread CPUs, it would be a very unusual application to find a gain for running 2 MPI processes per core, but if the sight of 4 processes running on your graph was your goal, I won't argue against it. You must be aware that most clusters running CPUs of the past have HT disabled in BIOS setup. -- Tim Prince
Re: [OMPI users] How could OpenMPI (or MVAPICH) affect floating-point results?
On 9/20/2011 7:25 AM, Reuti wrote: Hi, Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L: I am observing differences in floating-point results from an application program that appear to be related to whether I link with OpenMPI 1.4.3 or MVAPICH 1.2.0. Both packages were built with the same installation of Intel 11.1, as well as the application program; identical flags passed to the compiler in each case. I’ve tracked down some differences in a compute-only routine where I’ve printed out the inputs to the routine (to 18 digits) ; the inputs are identical. The output numbers are different in the 16th place (perhaps a few in the 15th place). These differences only show up for optimized code, not for –O0. My assumption is that some optimized math intrinsic is being replaced dynamically, but I do not know how to confirm this. Anyone have guidance to offer? Or similar experience? yes, I face it often but always at a magnitude where it's not of any concern (and not related to any MPI). Due to the limited precision in computers, a simple reordering of operation (although being equivalent in a mathematical sense) can lead to different results. Removing the anomalies with -O0 could proof that. The other point I heard especially for the x86 instruction set is, that the internal FPU has still 80 bits, while the presentation in memory is only 64 bit. Hence when all can be done in the registers, the result can be different compared to the case when some interim results need to be stored to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it to stay always in 64 bit, and a similar one for Intel is -mp (now -fltconsistency) and -mp1. Diagnostics below indicate that ifort 11.1 64-bit is in use. The options aren't the same as Reuti's "now" version (a 32-bit compiler which hasn't been supported for 3 years or more?). With ifort 10.1 and more recent, you would set at least -assume protect_parens -prec-div -prec-sqrt if you are interested in numerical consistency. If you don't want auto-vectorization of sum reductions, you would use instead -fp-model source -ftz (ftz sets underflow mode back to abrupt, while "source" sets gradual). It may be possible to expose 80-bit x87 by setting the ancient -mp option, but such a course can't be recommended without additional cautions. Quoted comment from OP seem to show a somewhat different question: Does OpenMPI implement any operations in a different way from MVAPICH? I would think it probable that the answer could be affirmative for operations such as allreduce, but this leads well outside my expertise with respect to specific MPI implementations. It isn't out of the question to suspect that such differences might be aggravated when using excessively aggressive ifort options such as -fast. libifport.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x2b6e7e081000) libifcoremt.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5 (0x2b6e7e1ba000) libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x2b6e7e45f000) libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x2b6e7e7f4000) libintlc.so.5 => /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x2b6e7ea0a000) -- Tim Prince
Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?
On 9/20/2011 10:50 AM, Blosch, Edwin L wrote: It appears to be a side effect of linkage that is able to change a compute-only routine's answers. I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of corruption may be going on. Those intrinsics have direct instruction set translations which shouldn't vary from -O1 on up nor with linkage options nor be affected by MPI or insertion of WRITEs. -- Tim Prince
Re: [OMPI users] Building with thread support on Windows?
On 9/21/2011 11:18 AM, Björn Regnström wrote: Hi, I am trying to build Open MPI 1.4.3 with thread support on Windows. A trivial test program runs if it calls MPI_Init or MP_Init_thread(int *argc, char ***argv, int required, int *provide) with reguired=0 but hangs if required>0. ompi_info for my build reports that there is no thread support but MPI_Init_thread returns provide==required. The only change in the CMake configuration was to check OMPI_ENABLE_MPI_THREADS. Is there anything else that needs to be done with the configuration? I have built 1.4.3 with thread support on several linuxes and mac and it works fine there. Not all Windows compilers work well enough with all threading models that you could expect satisfactory results; in particular, the compilers and thread libraries you use on linux may not be adequate for Windows thread support. -- Tim Prince
Re: [OMPI users] Question about compilng with fPIC
On 9/21/2011 11:44 AM, Blosch, Edwin L wrote: Follow-up to a mislabeled thread: "How could OpenMPI (or MVAPICH) affect floating-point results?" I have found a solution to my problem, but I would like to understand the underlying issue better. To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked with OpenMPI fails. The earliest symptom I could see was some strange difference in numerical values of quantities that should be unaffected by MPI calls. Tim's advice guided me to assume memory corruption. Eugene's advice guided me to explore the detailed differences in compilation. I observed that the MVAPICH mpif90 wrapper adds -fPIC. I tried adding -fPIC and -mcmodel=medium to the compilation of the OpenMPI-linked executable. Now it works fine. I haven't tried without -mcmodel=medium, but my guess is -fPIC did the trick. Does anyone know why compiling with -fPIC has helped? Does it suggest an application problem or an OpenMPI problem? To note: This is an Infiniband-based cluster. The application does pretty basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, isend, irecv, waitall. There is one task that uses iprobe with MPI_ANY_TAG, but this task is only involved in certain cases (including this one). Conversely, cases that do not call iprobe have not yet been observed to crash. I am deducing that this function is the problem. If you are making a .so, the included .o files should be built with -fPIC or similar. Ideally, the configure and build tools would enforce this. -- Tim Prince
Re: [OMPI users] EXTERNAL: Re: Question about compilng with fPIC
On 9/21/2011 12:22 PM, Blosch, Edwin L wrote: Thanks Tim. I'm compiling source units and linking them into an executable. Or perhaps you are talking about how OpenMPI itself is built? Excuse my ignorance... The source code units are compiled like this: /usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad -xHost -falign-functions -fpconstant -O2 -I. -I/usr/mpi/intel/openmpi-1.4.3/include -c ../code/src/main/main.f90 The link step is like this: /usr/mpi/intel/openmpi-1.4.3/bin/mpif90 -D_GNU_SOURCE -traceback -align -pad -xHost -falign-functions -fpconstant -static-intel -o ../bin/ -lstdc++ OpenMPI itself was configured like this: ./configure --prefix=/release/cfd/openmpi-intel --without-tm --without-sge --without-lsf --without-psm --without-portals --without-gm --without-elan --without-mx --without-slurm --without-loadleveler --enable-mpirun-prefix-by-default --enable-contrib-no-build=vt --enable-mca-no-build=maffinity --disable-per-user-config-files --disable-io-romio --with-mpi-f90-size=small --enable-static --disable-shared CXX=/appserv/intel/Compiler/11.1/072/bin/intel64/icpc CC=/appserv/intel/Compiler/11.1/072/bin/intel64/icc 'CFLAGS= -O2' 'CXXFLAGS= -O2' F77=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FFLAGS=-D_GNU_SOURCE -traceback -O2' FC=/appserv/intel/Compiler/11.1/072/bin/intel64/ifort 'FCFLAGS=-D_GNU_SOURCE -traceback -O2' 'LDFLAGS= -static-intel' ldd output on the final executable gives: linux-vdso.so.1 => (0x7fffb77e7000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x2b2e2b652000) libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x2b2e2b95e000) libdl.so.2 => /lib64/libdl.so.2 (0x2b2e2bb6d000) libnsl.so.1 => /lib64/libnsl.so.1 (0x2b2e2bd72000) libutil.so.1 => /lib64/libutil.so.1 (0x2b2e2bf8a000) libm.so.6 => /lib64/libm.so.6 (0x2b2e2c18d000) libpthread.so.0 => /lib64/libpthread.so.0 (0x2b2e2c3e4000) libc.so.6 => /lib64/libc.so.6 (0x2b2e2c60) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x2b2e2c959000) /lib64/ld-linux-x86-64.so.2 (0x2b2e2b433000) Do you see anything that suggests I should have been compiling the application and/or OpenMPI with -fPIC? If you were building any OpenMPI shared libraries, those should use -fPIC. configure may have made the necessary additions. If your application had shared libraries, you would require -fPIC, but apparently you had none. The shared libraries you show presumably weren't involved in your MPI or application build, and you must have linked in static versions of your MPI libraries, where -fPIC wouldn't be required. -- Tim Prince
[OMPI users] Run Time problem: Program hangs when utilizing multiple nodes.
Hello, The problem that I have been having is running my application across multiple nodes. Here are the details of what I have debugged thus far. I am going to follow the numbered list from the getting help page: ( http://www.open-mpi.org/community/help/) 1 ) I checked for a solution to this problem throughout the FAQ as well as the mailing list, but was unsuccessful in resolving the issue. 2) Version of openmpi: openmpi v1.4.4 3) I found the config.log, but it is very large, so I was unable to attach it. If you would like me to I can upload it and provide a link. 4) ompi_info --all output: see attached file 'ompi_info_all.txt' 5)'ompi_info -v ompi full --parsable' (ran using: 'mpirun --bynode *--hostfile my_hostfile* --tag-output ompi_info -v ompi full --parsable' [1,0]:package:Open MPI root@intel16 Distribution [1,0]:ompi:version:full:1.4.4 [1,0]:ompi:version:svn:r25188 [1,0]:ompi:version:release_date:Sep 27, 2011 [1,0]:orte:version:full:1.4.4 [1,0]:orte:version:svn:r25188 [1,0]:orte:version:release_date:Sep 27, 2011 [1,0]:opal:version:full:1.4.4 [1,0]:opal:version:svn:r25188 [1,0]:opal:version:release_date:Sep 27, 2011 [1,0]:ident:1.4.4 [1,1]:package:Open MPI root@intel16Distribution [1,1]:ompi:version:full:1.4.4 [1,1]:ompi:version:svn:r25188 [1,1]:ompi:version:release_date:Sep 27, 2011 [1,1]:orte:version:full:1.4.4 [1,1]:orte:version:svn:r25188 [1,1]:orte:version:release_date:Sep 27, 2011 [1,1]:opal:version:full:1.4.4 [1,1]:opal:version:svn:r25188 [1,1]:opal:version:release_date:Sep 27, 2011 [1,1]:ident:1.4.4 6) Detailed description: I have a fortran90 application that solves a system of linear equations using LU Decomposition. The application has three components. matrix_fill , matrix_decomp, and matrix_solve. The application has a make option for compiling the application using MPI. I have successfully compiled the application using openmpi v1.4.4, and can run the application. I utilize the '--hostfile' parameter when executing mpirun. For testing purposes I modified this file to see if I could narrow down the problem. I am able to run the program locally (on the same node that mpirun is being executed on) when utilizing 1 or greater than 1 slots (i was able to run with 12 slots on a single node). I am also able to mpirun on 1 or 2 slots on a single remote node as well. The problem occurs when I try to have two nodes work together, such that I specify two separate nodes in the hostfile and use -np 2 when executing mpirun). Here is an example of the my_hostfile (when the problem occurs) intel15 intel16 and this is an example of the command used: [intel15] > mpirun --hostfile my_hostfile -np 2 matrix_fill The problem occurs at a second call to MPI_BARRIER. The first MPI_BARRIER call is successful, but on the second one it hangs. Here is a basic outline of the code for up to the point of where the program hangs: [code] CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, my_rank, ierr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, group_size, ierr) ! creates buffers for each image !synchronize buffers CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) WRITE(6, *) 'Initializing I/O for image #', my_image CALL flushio ! At this barrier the program hangs and must be killed using CTRL+C CALL MPI_BARRIER(MPI_COMM_WORLD, ierr) [/code] The hang only occurs when trying to use -np 2 (or larger) and on multiple nodes that are networked together. At first I thought it was a firewall issue, so i ran 'service iptables stop' as root, but sadly this did not fix the problem. I am able to ssh between these nodes without a password, and the nodes are apart of a cluster of approximately 20 nodes at University of Maryland *B*altimore County. 7) Network info: see attached network_info.txt file: I have been trying to determine the root of this error for the past week, but with no success. Any help would be greatly appreciated. Thank you, Tim Package: Open MPI root@intel16 Distribution Open MPI: 1.4.4 Open MPI SVN revision: r25188 Open MPI release date: Sep 27, 2011 Open RTE: 1.4.4 Open RTE SVN revision: r25188 Open RTE release date: Sep 27, 2011 OPAL: 1.4.4 OPAL SVN revision: r25188 OPAL release date: Sep 27, 2011 Ident string: 1.4.4 MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.4) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.4) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.4) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.4) MCA carto: file (MCA v2.0, API v2.0, Component v1.4.4) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.4) MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.4) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.4)
Re: [OMPI users] How to justify the use MPI codes on multicore systems/PCs?
On 12/11/2011 12:16 PM, Andreas Schäfer wrote: Hey, on an SMP box threaded codes CAN always be faster than their MPI equivalents. One reason why MPI sometimes turns out to be faster is that with MPI every process actually initializes its own data. Therefore it'll end up in the NUMA domain to which the core running that process belongs. A lot of threaded codes are not NUMA aware. So, for instance the initialization is done sequentially (because it may not take a lot of time), and Linux' first touch policy makes all memory pages belong to a single domain. In essence, those codes will use just a single memory controller (and its bandwidth). Many applications require significant additional RAM and message passing communication per MPI rank. Where those are not adverse issues, MPI is likely to out-perform pure OpenMP (Andreas just quoted some of the reasons), and OpenMP is likely to be favored only where it is an easier development model. The OpenMP library also should implement a first-touch policy, but it's very difficult to carry out fully in legacy applications. OpenMPI has had effective shared memory message passing from the beginning, as did its predecessor (LAM) and all current commercial MPI implementations I have seen, so you shouldn't have to beat on an issue which was dealt with 10 years ago. If you haven't been watching this mail list, you've missed some impressive reporting of new support features for effective pinning by CPU, cache, etc. When you get to hundreds of nodes, depending on your application and interconnect performance, you may need to consider "hybrid" (OpenMP as the threading model for MPI_THREAD_FUNNELED mode), if you are running a single application across the entire cluster. The biggest cluster in my neighborhood, which ranked #54 on the recent Top500, gave best performance in pure MPI mode for that ranking. It uses FDR infiniband, and ran 16 ranks per node, for 646 nodes, with DGEMM running in 4-wide vector parallel. Hybrid was tested as well, with each multiple-thread rank pinned to a single L3 cache. All 3 MPI implementations which were tested have full shared memory message passing and pinning to local cache within each node (OpenMPI and 2 commercial MPIs). -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 9:49 AM, Micah Sklut wrote: I have installed openmpi for gfortran, but am now attempting to install openmpi as ifort. I have run the following configuration: ./configure --prefix=/opt/openmpi/intel CC=gcc CXX=g++ F77=ifort FC=ifort The install works successfully, but when I run /opt/openmpi/intel/bin/mpif90, it runs as gfortran. Oddly, when I am user: root, the same mpif90 runs as ifort. Can someone please alleviate my confusion as to why I mpif90 is not running as ifort? You might check your configure logs to be certain that ifort was found before gfortran at all stages (did you set paths according to sourcing the ifortvars or compilervars scripts which come with ifort?). 'which mpif90' should tell you whether you are executing the one from your installation. You may have another mpif90 coming first on your PATH. You won't be able to override your PATH and LD_LIBRARY_PATH correctly simply by specifying absolute path to mpif90. -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 1:20 PM, Fernanda Oliveira wrote: Hi Micah, I do not know if it is exactly what you need but I know that there are environment variables to use with intel mpi. They are: I_MPI_CC, I_MPI_CXX, I_MPI_F77, I_MPI_F90. So, you can set this using 'export' for bash, for instance or directly when you run. I use in my bashrc: export I_MPI_CC=icc export I_MPI_CXX=icpc export I_MPI_F77=ifort export I_MPI_F90=ifort Let me know if it helps. Fernanda Oliveira I didn't see any indication that Intel MPI was in play here. Of course, that's one of the first thoughts, as under Intel MPI, mpif90 uses gfortran mpiifort uses ifort mpicc uses gcc mpiCC uses g++ mpiicc uses icc mpiicpc uses icpc and all the Intel compilers use g++ to find headers and libraries. The advice to try 'which mpif90' would show whether you fell into this bunker. If you use Intel cluster checker, you will see noncompliance if anyone's MPI is on the default paths. You must set paths explicitly according to the MPI you want. Admittedly, that tool didn't gain a high level of adoption. -- Tim Prince
Re: [OMPI users] openmpi - gfortran and ifort conflict
On 12/14/2011 12:52 PM, Micah Sklut wrote: Hi Gustavo, Here is the output of : barells@ip-10-17-153-123:~> /opt/openmpi/intel/bin/mpif90 -showme gfortran -I/usr/lib64/mpi/gcc/openmpi/include -pthread -I/usr/lib64/mpi/gcc/openmpi/lib64 -L/usr/lib64/mpi/gcc/openmpi/lib64 -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl This points to gfortran. I do see what you are saying about the 1.4.2 and 1.4.4 components. I'm not sure why that is, but there seems to be some conflict with the existing openmpi, before recently installed 1.4.4 and trying to install with ifort. This is one of the reasons for recommending complete removal (rpm -e if need be) of any MPI which is on a default path (and setting a clean path) before building a new one, as well as choosing a unique install path for the new one. -- Tim Prince
Re: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ...
mpiler release, or perhaps has not tried it yet. Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Richard Walsh [richard.wa...@csi.cuny.edu] Sent: Friday, December 16, 2011 3:12 PM To: Open MPI Users Subject: [OMPI users] Latest Intel Compilers (ICS, version 12.1.0.233 Build 20110811) issues ... All, Working through a stock rebuild of OpenMPI 1.5.4 and 1.4.4 with the most current compiler suites from both PGI and Intel: 1. PGI, Version 11.10 2. Intel, Version 12.1.0.233 Build 20110811 My 1.5.4 'config.log' header looks like this for Intel: ./configure CC=icc CXX=icpc F77=ifort FC=ifort --with-openib --prefix=/share/apps/openmpi-intel/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 and this for PGI: ./configure CC=pgcc CXX=pgCC F77=pgf77 FC=pgf90 --with-openib --prefix=/share/apps/openmpi-pgi/1.5.4 --with-tm=/share/apps/pbs/11.1.0.111761 This configure line has been used successfully before. Configuration, build, and install for BOTH compilers seems to work OK; however, my 'mpicc' build of my basic test program ONLY works with the PGI built version of 'mpicc' for either the 1.4.4 or the 1.5.4 will compile the code. The Intel 1.4.4 and 1.5.4 'mpicc' wrapper-compilers produce an immediate segmentation fault: .[richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19163 Segmentation fault /share/apps/openmpi-intel/1.5.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c [richard.walsh@bob pbs]$ [richard.walsh@bob pbs]$ ./compile_it ./compile_it: line 10: 19515 Segmentation fault /share/apps/openmpi-intel/1.4.4/bin/mpicc -o ./hello_mpi.exe hello_mpi.c This Intel stack is from the most recent release of their ICS released in late October before SC11: [richard.walsh@bob pbs]$ icc -V Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. [richard.walsh@bob pbs]$ ifort -V Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.1.0.233 Build 20110811 Copyright (C) 1985-2011 Intel Corporation. All rights reserved. Has anyone else encountered this problem ... ?? Suggestions ... ?? Thanks, rbw Richard Walsh Parallel Applications and Systems Manager CUNY HPC Center, Staten Island, NY W: 718-982-3319 M: 612-382-4620 Right, as the world goes, is only in question between equals in power, while the strong do what they can and the weak suffer what they must. -- Thucydides, 400 BC Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users Change is in the Air - Smoking in Designated Areas Only in effect.<http://www.csi.cuny.edu/tobaccofree> Tobacco-Free Campus as of July 1, 2012. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- --- Tim Carlson, PhD Senior Research Scientist Environmental Molecular Sciences Laboratory
Re: [OMPI users] parallelising ADI
On 03/06/2012 03:59 PM, Kharche, Sanjay wrote: Hi I am working on a 3D ADI solver for the heat equation. I have implemented it as serial. Would anybody be able to indicate the best and more straightforward way to parallelise it. Apologies if this is going to the wrong forum. If it's to be implemented in parallelizable fashion (not SSOR style where each line uses updates from the previous line), it should be feasible to divide the outer loop into an appropriate number of blocks, or decompose the physical domain and perform ADI on individual blocks, then update and repeat. -- Tim Prince
Re: [OMPI users] [EXTERNAL] Possible to build ompi-1.4.3 or 1.4.5 without a C++ compiler?
On 03/20/2012 08:35 AM, Gunter, David O wrote: I wish it were that easy. When I go that route, I get error messages like the following when trying to compile the parallel code with Intel: libmpi.so: undefined reference to `__intel_sse2_strcpy' and other messages for every single Intel-implemented standard C-function. -david -- There was a suggestion in the snipped portion which suggested you use gcc/g++ together with ifort; that doesn't appear to be what you mean by "that route." (unless you forgot to recompile your .c files by gcc) You have built some objects with an Intel compiler (either ifort or icc/icpc) which is referring to this Intel library function, but you apparently didn't link against the library which provides it. If you use one of those Intel compilers to drive the link, and your environment paths are set accordingly, the Intel libraries would be linked automatically. There was a single release of the compiler several years ago (well out of support now) where that sse2 library was omitted, although the sse3 version was present. -- Tim Prince
Re: [OMPI users] redirecting output
On 03/30/2012 10:41 AM, tyler.bal...@huskers.unl.edu wrote: I am using the command mpirun -np nprocs -machinefile machines.arch Pcrystal and my output strolls across my terminal I would like to send this output to a file and I cannot figure out how to do soI have tried the general > FILENAME and > log & these generate files however they are empty.any help would be appreciated. If you run under screen your terminal output should be collected in screenlog. Beats me why some sysadmins don't see fit to install screen. -- Tim Prince
Re: [OMPI users] OpenMPI fails to run with -np larger than 10
This may or may not be related, but I've had similar issues on RHEL 6.x and clones when using the SSH job launcher and running more than 10 processes per node. It sounds like you're only distributing 6 processes per node, so it doesn't sound like your problem, but you might want to check your hostfile and make sure you're not oversubscribing one of the nodes. The trick I've found to launch > 10 processes per node via SSH is to set MaxSessions to some number higher than 10 in /etc/ssh/sshd_config (I choose 100, somewhat arbitrarily). Assuming you're using the SSH launcher on an RHEL 6 derivative, you might give this a try. It's an SSH issue, not an OpenMPI one. Regards, Tim On Thu, Apr 12, 2012 at 9:04 AM, Seyyed Mohtadin Hashemi wrote: > Hello, > > I have a very peculiar problem: I have a micro cluster with three nodes (18 > cores total); the nodes are clones of each other and connected to a frontend > via Ethernet and Debian squeeze as the OS for all nodes. When I run parallel > jobs I can used up “-np 10” if I go further the job crashes, I have > primarily done tests with GROMACS (because that is what I will be running) > but have also used OSU Micro-Benchmarks 3.5.2. > > For a simple parallel job I use: “path/mpirun –hostfile path/hostfile –np XX > –d –display-map path/mdrun_mpi –s path/topol.tpr –o path/output.trr” > > (path is global) For –np XX being smaller than or 10 it works, however as > soon as I make use of 11 or larger the whole thing crashes. The terminal > dump is attached to this mail: when_working.txt is for “–np 10”, > when_crash.txt is for “–np 12”, and OpenMPI_info.txt is output from > “path/mpirun --bynode --hostfile path/hostfile --tag-output ompi_info -v > ompi full –parsable” > > I have tried OpenMPI v.1.4.2 all the way up to beta v1.5.5, and all yield > the same result. > > The output files are from a new install I did today: I formatted all nodes > and started from a fresh minimal install of Squeeze and used "apt-get > install gromacs gromacs-openmpi" and installed all dependencies. Then I ran > two jobs using the parameters described above, I also did one with OSU bench > (data is not included) it also crashed with “-np” larger than 10. > > I hope somebody can help figure out what is wrong and how I can fix it. > > Best regards, > Mohtadin > > * > ** > ** > ** WARNING: This email contains an attachment of a very suspicious type. > ** > ** You are urged NOT to open this attachment unless you are absolutely > ** > ** sure it is legitimate. Opening this attachment may cause irreparable > ** > ** damage to your computer and your files. If you have any questions > ** > ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. > ** > ** > ** > ** This warning was added by the IU Computer Science Dept. mail scanner. > ** > * > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Cannot compile code with gfortran + OpenMPI when OpenMPI was built with latest intl compilers
On 5/19/2012 2:20 AM, Sergiy Bubin wrote: I built OpenMPI with that set of intel compilers. Everything seems to be fine and I can compile my fortran+MPI code with no problem when I invoke ifort. I should say that I do not actually invoke the "wrapper" mpi compiler. I normally just add flags as MPICOMPFLAGS=$(shell mpif90 --showme:compile) and MPILINKFLAGS=$(shell mpif90 --showme:link) in my makefile. I know it is not the recommended way of doing things but the reason I do that is that I absolutely need to be able to use different fortran compilers to build my fortran code. Avoiding the use of mpif90 accomplishes nothing for changing between incompatible Fortran compilers. Run-time libraries are incompatible among ifort, gfortran, and Oracle Fortran, so you can't link a mixture of objects compiled by incompatible Fortran compilers except in limited circumstances. This includes the MPI Fortran library. I don't see how it is too great an inconvenience for your Makefile to set PATH and LD_LIBRARY_PATH to include the mpif90 corresponding to the chosen Fortran compiler. You may need to build your own mpif90 for gfortran as well as the other compilers, so as to configure it to keep it off the default PATHs (e.g. --prefix=/opt/ompi1.4gf/), if you can't move the Ubuntu ompi. Surely most of this is implied in the OpenMPI instructions. -- Tim Prince
Re: [OMPI users] undefined reference to `netcdf_mp_nf90_open_'
On 6/26/2012 9:20 AM, Jeff Squyres wrote: Sorry, this looks like an application issue -- i.e., the linker error you're getting doesn't look like it's coming from Open MPI. Perhaps it's a missing application/middleware library. More specifically, you can take the mpif90 command that is being used to generate these errors and add "--showme" to the end of it, and you'll see what underlying compiler command is being executed under the covers. That might help you understand exactly what is going on. On Jun 26, 2012, at 7:13 AM, Syed Ahsan Ali wrote: Dear All I am getting following error while compilation of an application. Seems like something related to netcdf and mpif90. Although I have compiled netcdf with mpif90 option, dont why this error is happening. Any hint would be highly appreciated. /home/pmdtest/cosmo/source/cosmo_110525_4.18/obj/src_obs_proc_cdf.o: In function `src_obs_proc_cdf_mp_obs_cdf_read_org_': /home/pmdtest/cosmo/source/cosmo_110525_4.18/src/src_obs_proc_cdf.f90:(.text+0x17aa): undefined reference to `netcdf_mp_nf90_open_' If your mpif90 is properly built and set up with the same Fortran compiler you are using, it appears that either you didn't build the netcdf Fortran 90 modules with that compiler, or you didn't set the include path for the netcdf modules. This would work the same with mpif90 as with the underlying Fortran compiler. -- Tim Prince
Re: [OMPI users] compilation on windows 7 64-bit
On 07/27/2012 12:23 PM, Sayre, Alan N wrote: During compilation I get warning messages such as : c:\program files (x86)\openmpi_v1.6-x64\include\openmpi/ompi/mpi/cxx/op_inln.h(148): warning C4800: 'int' : forcing value to bool 'true' or 'false' (performance warning) cmsolver.cpp Which indicates that the openmpi version "openmpi_v1.6-x64" is 64 bit. And I'm sure that I installed the 64 bit version. I am compiling on a 64 bit version of Windows 7. setting X64 compiler project options? -- Tim Prince
Re: [OMPI users] mpi.h incorrect format error?
On 08/06/2012 07:35 AM, PattiMichelle wrote: mpicc -DFSEEKO64_OK -w -O3 -c -DLANDREAD_STUB -DDM_PARALLEL -DMAX_HISTORY=25 -c buf_for_proc.c You might need to examine the pre-processed source (mpicc -E buf_for_proc.c > buf_for_proc.i) to see what went wrong in pre-processing at the point where the compiler (gcc?) complains. I suppose you must have built mpicc yourself; you would need to assure that the mpicc on PATH is the one built with the C compiler on PATH. -- Tim Prince
Re: [OMPI users] 转发:lwkmpi
On 8/28/2012 5:11 AM, 清风 wrote: -- 原始邮 件 -- *发件人:* "295187383"<295187...@qq.com>; *发送时间:* 2012年8月28日(星期二) 下午4:13 *收件人:* "users"; *主题:* lwkmpi Hi everybody, I'm trying compile openmpi with intel compiler11.1.07 on ubuntu . I compiled openmpi many times and I could always find a problem. But the error that I'm getting now, gives me no clues where to even search for the problem. It seems I have succeed to configure.While I try "make all",it always show problems below: make[7]: 正在进入目录 `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool' /opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. -I../../.. -DINSIDE_OPENMPI -I/home/lwk/桌面/mnt/Software/openmpi- 1.6.1/opal/mca/hwloc/hwloc132/hwloc /include -I/usr/include/infiniband -I/usr/include/infiniband -DOPARI_VT -O3 -DNDEBUG -finline-functions -pthread -MT opari-ompragma_c.o -MD -MP -MF .deps/opari-ompragma_c.Tpo -c -o opari-ompragma_c.o `test -f 'ompragma_c.cc' || echo './'`ompragma_c.cc /usr/include/c++/4.5/iomanip(64): error: expected an expression { return { __mask }; } ^ Looks like your icpc is too old to work with your g++. If you want to build with C++ support, you'll need better matching versions of icpc and g++. icpc support for g++4.7 is expected to release within the next month; icpc 12.1 should be fine with g++ 4.5 and 4.6. -- Tim Prince
Re: [OMPI users] Compiling 1.6.1 with cygwin 1.7 and gcc
On 9/24/2012 1:02 AM, Roy Hogan wrote: I’m trying to build version 1.6.1 on Cygwin (1.7), using the gcc 4.5.3 compilers. I need to use the Cygwin linux environment specifically so I’m not interested in the cmake option on the windows side. I’ve searched the archives, but don’t find much on the Cygwin build option over the last couple of years. I’ve attached the logs for my “configure” and “make all” steps. Our email filter will not allow me to send zipped files, so I’ve attached the two log files. I’d appreciate any advice. Perhaps you mean cygwin posix environment. Evidently, your Microsoft-specific macros required in windows.c aren't handled by configury under cygwin, at least not if you don't specify that you want them. As you hinted, cygwin supports a more linux-like environment, although many of those macros should be handled by #include "windows.h". Do you have a reason for withholding information such as which Windows version you want to support, and your configure commands? -- Tim Prince
Re: [OMPI users] mpivars.sh - Intel Fortran 13.1 conflict with OpenMPI 1.6.3
On 01/24/2013 12:40 PM, Michael Kluskens wrote: This is for reference and suggestions as this took me several hours to track down and the previous discussion on "mpivars.sh" failed to cover this point (nothing in the FAQ): I successfully build and installed OpenMPI 1.6.3 using the following on Debian Linux: ./configure --prefix=/opt/openmpi/intel131 --disable-ipv6 --with-mpi-f90-size=medium --with-f90-max-array-dim=4 --disable-vt F77=/opt/intel/composer_xe_2013.1.117/bin/intel64/ifort FC=/opt/ intel/composer_xe_2013.1.117/bin/intel64/ifort CXXFLAGS=-m64 CFLAGS=-m64 CC=gcc CXX=g++ (disable-vt was required because of an error finding -lz which I gave up on). My .tcshrc file HAD the following: set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 For years I have used these procedures on Debian Linux and OS X with earlier versions of OpenMPI and Intel Fortran. However, at some point Intel Fortran started including "mpirt", including: /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpirun So even through I have the alias set for mpirun, I got the following error: mpirun -V .: 131: Can't open /opt/intel/composer_xe_2013.1.117/mpirt/bin/intel64/mpivars.sh Part of the confusion is that OpenMPI source does include a reference to "mpivars" in "contrib/dist/linux/openmpi.spec" The solution only occurred as I was writing this up, source intel setup first: source /opt/intel/composer_xe_2013.1.117/bin/compilervars.csh intel64 set path = (/opt/openmpi/intel131/bin $path) setenv LD_LIBRARY_PATH /opt/openmpi/intel131/lib:$LD_LIBRARY_PATH setenv MANPATH /opt/openmpi/intel131/share/man:$MANPATH alias mpirun "mpirun --prefix /opt/openmpi/intel131 " Now I finally get: mpirun -V mpirun (Open MPI) 1.6.3 The mpi runtime should be in the redistributable for their MPI compiler not in the base compiler. The question is how much of /opt/intel/composer_xe_2013.1.117/mpirt can I eliminate safely and should I ( multi-user machine were each user has their own Intel license, so I don't wish to trouble shoot this in the future ) ? ifort mpirt is a run-time to support co-arrays, but not full MPI. This version of the compiler checks in its path setting scripts whether Intel MPI is already on LD_LIBRARY_PATH, and so there is a conditional setting of the internal mpivars. I assume the co-array feature would be incompatible with OpenMPI and you would want to find a way to avoid any reference to that library, possibly by avoiding sourcing that part of ifort's compilervars. If you want a response on this subject from the Intel support team, their HPC forum might be a place to bring it up: http://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology -- Tim Prince
Re: [OMPI users] memory per core/process
On 03/30/2013 06:36 AM, Duke Nguyen wrote: On 3/30/13 5:22 PM, Duke Nguyen wrote: On 3/30/13 3:13 PM, Patrick Bégou wrote: I do not know about your code but: 1) did you check stack limitations ? Typically intel fortran codes needs large amount of stack when the problem size increase. Check ulimit -a First time I heard of stack limitations. Anyway, ulimit -a gives $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 127368 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited So stack size is 10MB??? Does this one create problem? How do I change this? I did $ ulimit -s unlimited to have stack size to be unlimited, and the job ran fine!!! So it looks like stack limit is the problem. Questions are: * how do I set this automatically (and permanently)? * should I set all other ulimits to be unlimited? In our environment, the only solution we found is to have mpirun run a script on each node which sets ulimit (as well as environment variables which are more convenient to set there than in the mpirun), before starting the executable. We had expert recommendations against this but no other working solution. It seems unlikely that you would want to remove any limits which work at default. Stack size unlimited in reality is not unlimited; it may be limited by a system limit or implementation. As we run up to 120 threads per rank and many applications have threadprivate data regions, ability to run without considering stack limit is the exception rather than the rule. -- Tim Prince
Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5
On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote: Maybe I should add that my Intel C++ and Fortran compilers are different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that be an issue? Also, when I check for the location of ifort, it seems to be in usr/bin - which is different than the C compiler (even though I have folders /opt/intel/composer_xe_2013 and /opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source /opt/intel/bin/ifortvars.sh intel64/ too. Geraldine On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote: I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ composer (12.0.2). My OS is OSX 10.7.5. I am not a computer whizz so I hope I can explain what I did properly: 1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/ and then /echo PATH/ showed: //opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/ / / 2)/which icc /and /which icpc /showed: //opt/intel/composerxe-2011.2.142/bin/intel64/icc/ and //opt/intel/composerxe-2011.2.142/bin/intel64/icpc/ / / So that all seems okay to me. Still when I do /./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/opt/openmpi-1.6.4/ from the folder in which the extracted OpenMPI files sit, I get // /== Configuring Open MPI/ // / / /*** Startup tests/ /checking build system type... x86_64-apple-darwin11.4.2/ /checking host system type... x86_64-apple-darwin11.4.2/ /checking target system type... x86_64-apple-darwin11.4.2/ /checking for gcc... icc/ /checking whether the C compiler works... no/ /configure: error: in `/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/ /configure: error: C compiler cannot create executables/ /See `config.log' for more details/ / / You do need to examine config.log and show it to us if you don't understand it. Attempting to use the older C compiler and libraries to link .o files made by the newer Fortran is likely to fail. If you wish to attempt this, assuming the Intel compilers are installed in default directories, I would suggest you source the environment setting for the older compiler, then the newer one, so that the newer libraries will be found first and the older ones used only when they aren't duplicated by the newer ones. You also need the 64-bit g++ active. -- Tim Prince
Re: [OMPI users] Configuration with Intel C++ Composer 12.0.2 on OSX 10.7.5
On 05/16/2013 10:13 PM, Tim Prince wrote: On 5/16/2013 2:16 PM, Geraldine Hochman-Klarenberg wrote: Maybe I should add that my Intel C++ and Fortran compilers are different versions. C++ is 12.0.2 and Fortran is 13.0.2. Could that be an issue? Also, when I check for the location of ifort, it seems to be in usr/bin - which is different than the C compiler (even though I have folders /opt/intel/composer_xe_2013 and /opt/intel/composer_xe_2013.3.171 etc.). And I have tried /source /opt/intel/bin/ifortvars.sh intel64/ too. Geraldine On May 16, 2013, at 11:57 AM, Geraldine Hochman-Klarenberg wrote: I am having trouble configuring OpenMPI-1.6.4 with the Intel C/C++ composer (12.0.2). My OS is OSX 10.7.5. I am not a computer whizz so I hope I can explain what I did properly: 1) In bash, I did /source /opt/intel/bin/compilervars.sh intel64/ and then /echo PATH/ showed: //opt/intel/composerxe-2011.2.142/bin/intel64:/opt/intel/composerxe-2011.2.142/mpirt/bin/intel64:/opt/intel/composerxe-2011.2.142/bin:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:.:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin/ / / 2)/which icc /and /which icpc /showed: //opt/intel/composerxe-2011.2.142/bin/intel64/icc/ and //opt/intel/composerxe-2011.2.142/bin/intel64/icpc/ / / So that all seems okay to me. Still when I do /./configure CC=icc CXX=icpc F77=ifort FC=ifort --prefix=/opt/openmpi-1.6.4/ from the folder in which the extracted OpenMPI files sit, I get // /== Configuring Open MPI/ // / / /*** Startup tests/ /checking build system type... x86_64-apple-darwin11.4.2/ /checking host system type... x86_64-apple-darwin11.4.2/ /checking target system type... x86_64-apple-darwin11.4.2/ /checking for gcc... icc/ /checking whether the C compiler works... no/ /configure: error: in `/Users/geraldinehochman-klarenberg/Projects/openmpi-1.6.4':/ /configure: error: C compiler cannot create executables/ /See `config.log' for more details/ / / You do need to examine config.log and show it to us if you don't understand it. Attempting to use the older C compiler and libraries to link .o files made by the newer Fortran is likely to fail. If you wish to attempt this, assuming the Intel compilers are installed in default directories, I would suggest you source the environment setting for the older compiler, then the newer one, so that the newer libraries will be found first and the older ones used only when they aren't duplicated by the newer ones. You also need the 64-bit g++ active. It's probably unnecessary to use icpc at all when building OpenMPI. icpc is compatible with gcc/g++ built objects, -- Tim Prince
Re: [OMPI users] basic questions about compiling OpenMPI
On 5/22/2013 11:34 AM, Paul Kapinos wrote: On 05/22/13 17:08, Blosch, Edwin L wrote: Apologies for not exploring the FAQ first. No comments =) If I want to use Intel or PGI compilers but link against the OpenMPI that ships with RedHat Enterprise Linux 6 (compiled with g++ I presume), are there any issues to watch out for, during linking? At least, the Fortran-90 bindings ("use mpi") won't work at all (they're compiler-dependent. So, our way is to compile a version of Open MPI with each compiler. I think this is recommended. Note also that the version of Open MPI shipped with Linux is usuallu a bit dusty. The gfortran build of Fortran library, as well as the .mod USE files, won't work with ifort or PGI compilers. g++ built libraries ought to work with sufficiently recent versions of icpc. As noted above, it's worth while to rebuild yourself, even if you use a (preferably more up to date version of) gcc, which you can use along with one of the commercial Fortran compilers for linux. -- Tim Prince
Re: [OMPI users] EXTERNAL: Re: basic questions about compiling OpenMPI
On 5/25/2013 8:26 AM, Jeff Squyres (jsquyres) wrote: On May 23, 2013, at 9:50 AM, "Blosch, Edwin L" wrote: Excellent. Now I've read the FAQ and noticed that it doesn't mention the issue with the Fortran 90 .mod signatures. Our applications are Fortran. So your replies are very helpful -- now I know it really isn't practical for us to use the default OpenMPI shipped with RHEL6 since we use both Intel and PGI compilers and have several applications to accommodate. Presumably if all the applications did INCLUDE 'mpif.h' instead of 'USE MPI' then we could get things working, but it's not a great workaround. No, not even if they use mpif.h. Here's a chunk of text from the v1.6 README: - While it is possible -- on some platforms -- to configure and build Open MPI with one Fortran compiler and then build MPI applications with a different Fortran compiler, this is not recommended. Subtle problems can arise at run time, even if the MPI application compiled and linked successfully. Specifically, the following two cases may not be portable between different Fortran compilers: 1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE will only compare properly to Fortran applications that were created with Fortran compilers that that use the same name-mangling scheme as the Fortran compiler with which Open MPI was configured. 2. Fortran compilers may have different values for the logical .TRUE. constant. As such, any MPI function that uses the Fortran LOGICAL type may only get .TRUE. values back that correspond to the the .TRUE. value of the Fortran compiler with which Open MPI was configured. Note that some Fortran compilers allow forcing .TRUE. to be 1 and .FALSE. to be 0. For example, the Portland Group compilers provide the "-Munixlogical" option, and Intel compilers (version >= 8.) provide the "-fpscomp logicals" option. You can use the ompi_info command to see the Fortran compiler with which Open MPI was configured. Even when the name mangling obstacle doesn't arise (it shouldn't for the cited case of gfortran vs. ifort), run-time library function usage is likely to conflict between the compiler used to build the MPI Fortran library and the compiler used to build the application. So there really isn't a good incentive to retrogress away from the USE files simply to avoid one aspect of mixing incompatible compilers. -- Tim Prince
Re: [OMPI users] Support for CUDA and GPU-direct with OpenMPI 1.6.5 an 1.7.2
On Mon, 8 Jul 2013, Elken, Tom wrote: It isn't quite so easy. Out of the box, there is no gcc on the Phi card. You can use the cross compiler on the host, but you don't get gcc on the Phi by default. See this post http://software.intel.com/en-us/forums/topic/382057 I really think you would need to build and install gcc on the Phi first. My first pass at doing a cross-compile with the GNU compilers failed to produce something with OFED support (not surprising) export PATH=/usr/linux-k1om-4.7/bin:$PATH ./configure --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \ --disable-mpi-f77 checking if MCA component btl:openib can compile... no Tim Thanks Tom, that sounds good. I will give it a try as soon as our Phi host here host gets installed. I assume that all the prerequisite libs and bins on the Phi side are available when we download the Phi s/w stack from Intel's site, right ? [Tom] Right. When you install Intel’s MPSS (Manycore Platform Software Stack), including following the section on “OFED Support” in the readme file, you should have all the prerequisite libs and bins. Note that I have not built Open MPI for Xeon Phi for your interconnect, but it seems to me that it should work. -Tom Cheers Michael On Mon, Jul 8, 2013 at 12:10 PM, Elken, Tom wrote: Do you guys have any plan to support Intel Phi in the future? That is, running MPI code on the Phi cards or across the multicore and Phi, as Intel MPI does? [Tom] Hi Michael, Because a Xeon Phi card acts a lot like a Linux host with an x86 architecture, you can build your own Open MPI libraries to serve this purpose. Our team has used existing (an older 1.4.3 version of) Open MPI source to build an Open MPI for running MPI code on Intel Xeon Phi cards over Intel’s (formerly QLogic’s) True Scale InfiniBand fabric, and it works quite well. We have not released a pre-built Open MPI as part of any Intel software release. But I think if you have a compiler for Xeon Phi (Intel Compiler or GCC) and an interconnect for it, you should be able to build an Open MPI that works on Xeon Phi. Cheers, Tom Elken thanks... Michael On Sat, Jul 6, 2013 at 2:36 PM, Ralph Castain wrote: Rolf will have to answer the question on level of support. The CUDA code is not in the 1.6 series as it was developed after that series went "stable". It is in the 1.7 series, although the level of support will likely be incrementally increasing as that "feature" series continues to evolve. On Jul 6, 2013, at 12:06 PM, Michael Thomadakis wrote: > Hello OpenMPI, > > I am wondering what level of support is there for CUDA and GPUdirect on OpenMPI 1.6.5 and 1.7.2. > > I saw the ./configure --with-cuda=CUDA_DIR option in the FAQ. However, it seems that with configure v1.6.5 it was ignored. > > Can you identify GPU memory and send messages from it directly without copying to host memory first? > > > Or in general, what level of CUDA support is there on 1.6.5 and 1.7.2 ? Do you support SDK 5.0 and above? > > Cheers ... > Michael > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users