Re: [OMPI users] Configure fails with icc 10.1.008
A quick reading of this thread makes it sound to me as if you are using icc to compile c++ code. The correct compiler to use is icpc. This has been the case since at least the version 9 release of the Intel compilers. icc will not compile c++ code. Hope this is useful. -david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboratory On Dec 6, 2007, at 9:25 PM, Eric Thibodeau wrote: Hello all, I am unable to get past ./configure as ICC fails on C++ tests (see attached ompi-output.tar.gz). Configure was called without and the with sourcing `/opt/intel/cc/10.1.xxx/bin/iccvars.sh` as per one of the invocation options in icc's doc. I was unable to find the relevant (well..intelligible for me that is ;P ) cause of the failure in config.log. Any help would be appreciated. Thanks, Eric Thibodeau ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt
On Oct 16, 2007, at 11:20 AM, Brian Granger wrote: Wow, that is quite a study of the different options. I will spend some time looking over things to better understand the (complex) situation. I will also talk with Lisandro Dalcin about what he thinks the best approach is for mpi4py. Brian / Lisandro -- I don't think that I heard back from you on this issue. Would you have major heartburn if I remove all linking of our components against libmpi (etc.)? (for a nicely-formatted refresher of the issues, check out https://svn.open-mpi.org/trac/ompi/wiki/Linkers) Thanks. One question though. You said that nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3 doesn't show the problem. Does this make sense? Brian On 10/16/07, Jeff Squyres wrote: On Oct 12, 2007, at 3:5 PM, Brian Granger wrote: My guess is that Rmpi is dynamically loading libmpi.so, but not specifying the RTLD_GLOBAL flag. This means that libmpi.so is not available to the components the way it should be, and all goes downhill from there. It only mostly works because we do something silly with how we link most of our components, and Linux is just smart enough to cover our rears (thankfully). In mpi4py, libmpi.so is linked in at compile time, not loaded using dlopen. Granted, the resulting mpi4py binary is loaded into python using dlopen. I believe that means that libmpi.so will be loaded as an indirect dependency of mpi4py. See the table below. The pt2pt component (rightly) does not have a -lmpi in its link line. The other components that use symbols in libmpi.so (wrongly) do have a -lmpi in their link line. This can cause some problems on some platforms (Linux tends to do dynamic linking / dynamic loading better than most). That's why only the pt2pt component fails. Did this change from 1.2.3 to 1.2.4? No: % diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/ ompi/mca/osc/pt2pt/Makefile.am % Solutions: - Someone could make the pt2pt osc component link in libmpi.so like the rest of the components and hope that no one ever tries this on a non-friendly platform. Shouldn't the openmpi build system be able to figure this stuff out on a per platform basis? I believe that this would not be useful -- see the tables and conclusions below. - Debian (and all Rmpi users) could configure Open MPI with the --disable-dlopen flag and ignore the problem. Are there disadvantages to this approach? You won't be able to add more OMPI components to your existing installation (e.g., 3rd party components). But that's probably ok, at least for now -- not many people are distributing 3rd party OMPI components. - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL flag and fix the problem properly. Again, my main problem with this solution is that it means I must both link to libmpi at compile time and load it dynamically using dlopen. This doesn't seem right. Also, it makes it impossible on OS X to avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being able to use openmpi without setting LD_LIBRARY_PATH is important. This is a very complex issue. Here's the possibilities that I see... (prepare for confusion!) = = = = == This first table represents what happens in the following scenarios: - compile an application against Open MPI's libmpi, or - compile an "application" DSO that is dlopen'ed with RTLD_GLOBAL, or - explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL OMPI DSO libmpiOMPI DSO components App linked includes components depend on against components? available? libmpi.so? Result -- --- -- -- -- 1. libmpi.sono noNA won't run 2. libmpi.sono yes no yes 3. libmpi.sono yes yes yes (*1*) 4. libmpi.soyes noNA yes 5. libmpi.soyes yes no maybe (*2*) 6. libmpi.soyes yes yes maybe (*3*) -- -- -- 7. libmpi.a no noNA won't run 8. libmpi.a no yes no yes (*4*) 9. libmpi.a no yes yes no (*5*) 10. libmpi.a yes noNA yes 11. libmpi.a yes yes no maybe (*6*) 12. libmpi.a yes yes yes no (*7*) -- -- All libmpi.a scenarios assume that libmpi.so is also available. In the OMPI v1.2 series, most components link against libmpi.so, but some do not (it's our mistake for not being uniform). (*1*) As far as we know, this works on al
[OMPI users] Question about issue with use of multiple IB ports
I just built OpenMPI-1.2.4 to work on my system (IB, OFED-1.2). When I run a job, I am getting the following message: WARNING: There are more than one active ports on host 'w74', but the default subnet GID prefix was detected on more than one of these ports. If these ports are connected to different physical IB networks, this configuration will fail in Open MPI. This version of Open MPI requires that every physically separate IB subnet that is used between connected MPI processes must have different subnet ID values. I went to the faq to read about the message. My code does complete successfully because both nodes are connected by both meshes. My question is, how can I tell mpirun that I only want to use of of the ports? I specifically want to use either port 1 or port 2, but not bond both together. Can this be done? Thanks, Craig -- Craig Tierney (craig.tier...@noaa.gov)
Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt
I don't think this will be a problem. We are now setting the flags correctly and doing a dlopen, which should enable the components to find everything in libmpi.so. If I remember correctly this new change would simply make all components compiled in a consistent way. I will run this by Lisandro and see what he thinks though. If you don't hear back from us within a day, assume everything is fine. Brian On Dec 10, 2007 10:13 AM, Jeff Squyres wrote: > On Oct 16, 2007, at 11:20 AM, Brian Granger wrote: > > > Wow, that is quite a study of the different options. I will spend > > some time looking over things to better understand the (complex) > > situation. I will also talk with Lisandro Dalcin about what he thinks > > the best approach is for mpi4py. > > Brian / Lisandro -- > > I don't think that I heard back from you on this issue. Would you > have major heartburn if I remove all linking of our components against > libmpi (etc.)? > > (for a nicely-formatted refresher of the issues, check out > https://svn.open-mpi.org/trac/ompi/wiki/Linkers) > > Thanks. > > > > > One question though. You said that > > nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3 > > doesn't show the problem. Does this make sense? > > > > Brian > > > > On 10/16/07, Jeff Squyres wrote: > >> On Oct 12, 2007, at 3:5 PM, Brian Granger wrote: > My guess is that Rmpi is dynamically loading libmpi.so, but not > specifying the RTLD_GLOBAL flag. This means that libmpi.so is not > available to the components the way it should be, and all goes > downhill from there. It only mostly works because we do something > silly with how we link most of our components, and Linux is just > smart enough to cover our rears (thankfully). > >>> > >>> In mpi4py, libmpi.so is linked in at compile time, not loaded using > >>> dlopen. Granted, the resulting mpi4py binary is loaded into python > >>> using dlopen. > >> > >> I believe that means that libmpi.so will be loaded as an indirect > >> dependency of mpi4py. See the table below. > >> > The pt2pt component (rightly) does not have a -lmpi in its link > line. The other components that use symbols in libmpi.so (wrongly) > do have a -lmpi in their link line. This can cause some problems on > some platforms (Linux tends to do dynamic linking / dynamic loading > better than most). That's why only the pt2pt component fails. > >>> > >>> Did this change from 1.2.3 to 1.2.4? > >> > >> No: > >> > >> % diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/ > >> ompi/mca/osc/pt2pt/Makefile.am > >> % > >> > Solutions: > > - Someone could make the pt2pt osc component link in libmpi.so > like the rest of the components and hope that no one ever > tries this on a non-friendly platform. > >>> > >>> Shouldn't the openmpi build system be able to figure this stuff > >>> out on > >>> a per platform basis? > >> > >> I believe that this would not be useful -- see the tables and > >> conclusions below. > >> > - Debian (and all Rmpi users) could configure Open MPI with the > >>> > --disable-dlopen flag and ignore the problem. > >>> > >>> Are there disadvantages to this approach? > >> > >> You won't be able to add more OMPI components to your existing > >> installation (e.g., 3rd party components). But that's probably ok, > >> at least for now -- not many people are distributing 3rd party OMPI > >> components. > >> > - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL > flag and fix the problem properly. > >>> > >>> Again, my main problem with this solution is that it means I must > >>> both > >>> link to libmpi at compile time and load it dynamically using dlopen. > >>> This doesn't seem right. Also, it makes it impossible on OS X to > >>> avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being able > >>> to use openmpi without setting LD_LIBRARY_PATH is important. > >> > >> This is a very complex issue. Here's the possibilities that I see... > >> (prepare for confusion!) > >> > >> = > >> = > >> = > >> = > >> == > >> > >> This first table represents what happens in the following scenarios: > >> > >> - compile an application against Open MPI's libmpi, or > >> - compile an "application" DSO that is dlopen'ed with RTLD_GLOBAL, or > >> - explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL > >> > >> OMPI DSO > >> libmpiOMPI DSO components > >> App linked includes components depend on > >> against components? available? libmpi.so? Result > >> -- --- -- -- -- > >> 1. libmpi.sono noNA won't run > >> 2. libmpi.sono yes no yes > >> 3. libmpi.sono yes yes yes (*1*) > >
Re: [OMPI users] Question about issue with use of multiple IB ports
On Dec 10, 2007, at 3:06 PM, Craig Tierney wrote: I just built OpenMPI-1.2.4 to work on my system (IB, OFED-1.2). When I run a job, I am getting the following message: WARNING: There are more than one active ports on host 'w74', but the default subnet GID prefix was detected on more than one of these ports. If these ports are connected to different physical IB networks, this configuration will fail in Open MPI. This version of Open MPI requires that every physically separate IB subnet that is used between connected MPI processes must have different subnet ID values. I went to the faq to read about the message. My code does complete successfully because both nodes are connected by both meshes. You can also assign a different subnet ID to each of the two fabrics. OMPI will therefore be able to tell these two networks apart and you won't get this warning message. We only treat the default subnet ID specially because most people don't change it, and if they have multiple fabrics, they could run into problems because OMPI won't be able to tell them apart. My question is, how can I tell mpirun that I only want to use of of the ports? I specifically want to use either port 1 or port 2, but not bond both together. The OMPI v1.2 series has fairly lame controls for this - you can limit how many IB ports an MPI process will use on each machine (via the btl_openib_max_btls MCA parameter), but not which ones. OMPI will use the first btl_openib_max_btls ports (the default is infinite). In OMPI v1.3, there are specific MCA parameters for controlling exactly which NICs and/or ports you want to use or not use. Specifically: - btl_openib_if_include: a comma-delimited list of interface names and/ or ports to use - btl_openib_if_exclude: a comma-delimited list of interface names and/ or ports to exclude (i.e., use all others) For example: mpirun --mca btl_openib_if_include mthca0,mthca1:1 ... Meaning "use all ports on mthca0" and "use port 1 on mthca1". -- Jeff Squyres Cisco Systems
Re: [OMPI users] Open MPI 1.2.4 verbosity w.r.t. osc pt2pt
Ok. I was planning to do this for OMPI v1.3 and above; not really planning to do this for the OMPI v1.2 series. We don't have an exact timeframe for OMPI v1.3 yet -- best guesses at this point is that it'll be somewhere in 1HCY08. On Dec 10, 2007, at 5:03 PM, Brian Granger wrote: I don't think this will be a problem. We are now setting the flags correctly and doing a dlopen, which should enable the components to find everything in libmpi.so. If I remember correctly this new change would simply make all components compiled in a consistent way. I will run this by Lisandro and see what he thinks though. If you don't hear back from us within a day, assume everything is fine. Brian On Dec 10, 2007 10:13 AM, Jeff Squyres wrote: On Oct 16, 2007, at 11:20 AM, Brian Granger wrote: Wow, that is quite a study of the different options. I will spend some time looking over things to better understand the (complex) situation. I will also talk with Lisandro Dalcin about what he thinks the best approach is for mpi4py. Brian / Lisandro -- I don't think that I heard back from you on this issue. Would you have major heartburn if I remove all linking of our components against libmpi (etc.)? (for a nicely-formatted refresher of the issues, check out https://svn.open-mpi.org/trac/ompi/wiki/Linkers) Thanks. One question though. You said that nothing had changed in this respect from 1.2.3 to 1.2.4, but 1.2.3 doesn't show the problem. Does this make sense? Brian On 10/16/07, Jeff Squyres wrote: On Oct 12, 2007, at 3:5 PM, Brian Granger wrote: My guess is that Rmpi is dynamically loading libmpi.so, but not specifying the RTLD_GLOBAL flag. This means that libmpi.so is not available to the components the way it should be, and all goes downhill from there. It only mostly works because we do something silly with how we link most of our components, and Linux is just smart enough to cover our rears (thankfully). In mpi4py, libmpi.so is linked in at compile time, not loaded using dlopen. Granted, the resulting mpi4py binary is loaded into python using dlopen. I believe that means that libmpi.so will be loaded as an indirect dependency of mpi4py. See the table below. The pt2pt component (rightly) does not have a -lmpi in its link line. The other components that use symbols in libmpi.so (wrongly) do have a -lmpi in their link line. This can cause some problems on some platforms (Linux tends to do dynamic linking / dynamic loading better than most). That's why only the pt2pt component fails. Did this change from 1.2.3 to 1.2.4? No: % diff openmpi-1.2.3/ompi/mca/osc/pt2pt/Makefile.am openmpi-1.2.4/ ompi/mca/osc/pt2pt/Makefile.am % Solutions: - Someone could make the pt2pt osc component link in libmpi.so like the rest of the components and hope that no one ever tries this on a non-friendly platform. Shouldn't the openmpi build system be able to figure this stuff out on a per platform basis? I believe that this would not be useful -- see the tables and conclusions below. - Debian (and all Rmpi users) could configure Open MPI with the --disable-dlopen flag and ignore the problem. Are there disadvantages to this approach? You won't be able to add more OMPI components to your existing installation (e.g., 3rd party components). But that's probably ok, at least for now -- not many people are distributing 3rd party OMPI components. - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL flag and fix the problem properly. Again, my main problem with this solution is that it means I must both link to libmpi at compile time and load it dynamically using dlopen. This doesn't seem right. Also, it makes it impossible on OS X to avoid setting LD_LIBRARY_PATH (OS X doesn't have rpath). Being able to use openmpi without setting LD_LIBRARY_PATH is important. This is a very complex issue. Here's the possibilities that I see... (prepare for confusion!) = = = = = === == This first table represents what happens in the following scenarios: - compile an application against Open MPI's libmpi, or - compile an "application" DSO that is dlopen'ed with RTLD_GLOBAL, or - explicitly dlopen Open MPI's libmpi with RTLD_GLOBAL OMPI DSO libmpiOMPI DSO components App linked includes components depend on against components? available? libmpi.so? Result -- --- -- -- -- 1. libmpi.sono noNA won't run 2. libmpi.sono yes no yes 3. libmpi.sono yes yes yes (*1*) 4. libmpi.soyes noNA yes 5. libmpi.soyes yes no maybe (*2*) 6. libmpi.soyes yes yes ma