Thanks for the detailed responces! I've included some stuff inline below: On Jan 2, 2008 1:56 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > On Dec 31, 2007, at 12:50 AM, Jim Kusznir wrote:
> > The rpm build errored out near the end with a missing file. It was > > trying to find /opt/openmpi-gcc/1.2.4/opt/share/openmpi-gcc (IIRC), > > but the last part was actually openmpi on disk. I ended up > > correcting it by changing line 182 (configuration logic) to: > > > > %define _datadir /opt/%{name}/%{version}/share/%{name} > > > > (I changed _pkgdatadir to _datadir). Your later directive if > > _pkgdatadir is undefined took care of _pkgdatadir. I must admit, I > > still don't fully understand where rpm was getting the idea to look > > for that file...I tried manually configuring _pkgdatadir to the path > > that existed, but that changed nothing. If I didn't rename the > > package, it all worked fine. > > Hmm. This is actually symptomatic of a larger problem -- Open MPI's > configure/build process is apparently not getting the _pkgdatadir > value, probably because there's no way to pass it on the configure > command line (i.e., there's no standard AC --pkgdatadir option). > Instead, the "$datadir/openmpi" location is hard-coded in the Open MPI > code base (in opal/mca/installdirs/config, if you care). As such, > when you re-defined %{_name}, the specfile didn't agree with where > OMPI actually installed the files, resulting in the error you saw. > Yuck. > > Well, there are other reasons you can't have multiple OMPI > installations share a single installation tree (e.g., they'll all try > to install their own "mpirun" executable -- per a prior thread, the -- > program-prefix/suffix stuff also doesn't work; see > https://svn.open-mpi.org/trac/ompi/ticket/1168 > for details). So this isn't making OMPI any worse than it already > is. :-\ > > So I think the best solution for the moment is to just fix the > specfile's %_pkgdatadir to use the hard-coded name "openmpi" instead > of %{name}. I actually tried this first, but it failed to accomplish anything (got the same error). However, now with defining %_datadir, it works with the name directive just fine. > I committed these changes (and some other small fixes for things I > found while testing the _name and multi-package stuff) to the OMPI SVN > trunk in r17036 (see https://svn.open-mpi.org/trac/ompi/changeset/ > 17036) -- could you give it a whirl and see if it works for you? > > And another from an off-list mail: > > > In the preamble for the separate rpm files, the -devel and -docs > > reference openmpi-runtime statically rather than using %{name}- > > runtime, which breaks dependencies if you build under a different > > name as I am. > > Doh. I tried replacing the Requires: with %{_name}-runtime, but then > rpmbuild complained: > > error: line 300: Dependency tokens must begin with alpha-numeric, '_' > or '/': Requires: %{_name}-runtime Huh..this is strange. Here's the chunk from my spec file and rpm version. I've now built 3 sets of multi-rpm openmpi, each with a different name, and its worked flawlessly: [root@aeolus ~]# rpmbuild --version RPM version 4.3.3 [root@aeolus ~]# grep Requires /usr/src/redhat/SPECS/openmpi.spec Requires: %{modules_rpm_name} Requires: %{mpi_selector_rpm_name} Requires: %{modules_rpm_name} Requires: %{name}-runtime Requires: %{name}-runtime Perhaps its the difference between _name and name. > So it looks like Requires: will only take a hard-coded name, not a > variable (I have no comments in the specfile about this issue, but > perhaps that's why Greg/I hard-coded it in the first place...?). > Yuck. :-( > > This error occurred with rpmbuild v4.3.3 (the default on RHEL4U4), so > I tried manually upgrading to v4.4.2.2 from rpm.org to see if this > constraint had been relaxed, but I couldn't [easily] get it to build. > I guess it wouldn't be attractive to use something that would only > work with the newest version RPM, anyway. > > We'll unfortunately have to do something different, then. :- > ( Obvious but icky solutions include: > > - remove the Requires statements > - protect the Requires statements to only be used when %{_name} is > "openmpi" > > Got any better ideas? > > > 3) Will the resulting -runtime .rpms (for the different compiler > > versions) coexist peacefully without any special environment munging > > on the compute nodes, or do I need modules, etc. on all the compute > > nodes as well? > > They can co-exist peacefully out on the nodes because you should > choose different --prefix values for each installation (e.g., /opt/ > openmpi_gcc3.4.0/ or whatever naming convention you choose to use). > That being said, you should ensure that whatever version of OMPI you > use is consistent across an entire job. E.g., if job X was compiled > with the openmpi-gcc installation, then it should use the openmpi-gcc > installation on all the nodes on which it runs. I currently have them all installed accross the cluster in the same place, and had planned on requiring users to sumbit jobs with the option to include their current enviornment, thus ensuring the PATHs have only the path to the correct version of OpenMPI. This would simplify things; I'll include this in my next build. > The easiest way to do that might be to use the --enable-mpirun-prefix- > by-default option to configure. This will cause OMPI to use mpirun's > --prefix option by default (even if you don't specify it on the mpirun > command line), which will effectively tell the remote node where OMPI > lives on the remote nodes (assuming your installation paths are the > same on all nodes -- e.g., /opt/openmpi-gcc). Then you can use > environment modules (or whatever) on your head node / the job's first > node to select which OMPI installation you want, use mpicc/mpiCC/ > mpif77/mpif90 to compile your job, and then mpirun will do the Right > thing to select the appropriate OMPI installation on remote nodes, > meaning that it will set the PATH and LD_LIBRARY_PATH on the remote > node for you. > > Make sense? > > See: > > http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path > http://www.open-mpi.org/faq/?category=running#mpirun-prefix > > for a little more detail. > > > 4) I've never really used pgi or intel's compiler. I saw notes in the > > rpm about build flag problems and "use your normal optimizations and > > flags", etc. As I have no concept of "normal" for these compilers, > > are there any guides or examples I should/could use for this? > > You'll probably want to check the docs for those compilers. > Generally, GCC-like -O options have similar definitions in these > compilers (they try to be similar to GCC). YMMV. For the time being, I just ran with defaults. for Intel, I was able to use the post-munge defaults; with PGI, I had to disable the optimization flags all together (the --define disable_rpm_opts_flags or something like that). I'll talk with the principal users in the next day or two and get more details from them on how their apps recommend compiling. Thanks for all the help and hard work on a .spec file! --Jim