On Apr 9, 2009, at 6:16 PM, Gus Correa wrote:

The configure scripts seem to have changed, and work different
than before, particularly w.r.t. additional libraries like numa,
torque, and openib.
The new behavior can be a bit unexpected and puzzled me,
although eventually I could build 1.3.1.


Yes, we put in this new functionality due to user requests.  See below.

Here are my observations.

1) I used to configure OpenMPI 1.2.8 and 1.3.0 with:

--with-libnuma=/usr/lib64 \
--with-tm=/usr/lib64 \
--with-openib=/usr/lib64 \

This worked fine for me on the same computer I am using for 1.3.1.
However, with 1.3.1 the same options fail.
Configure now tries to find the corresponding include
files on /usr/lib64/include, a directory that doesn't even exist.
The include files are actually in /usr/include
(as the old configure knew well).


What happened in the 1.2.x configure was that OMPI was adding -L/usr/ lib64/lib and trying to find the relevant libraries. But since /usr/ lib64 was already in your linker's default search path, the relevant libraries were found without any addition flags from OMPI. Additionally, OMPI was also adding -I/usr/lib64/include to the compile path, but the relevant header files were found because they were in your compiler's default search path (likely /usr/include). So both the added -I and -L flags were meaningless -- albeit harmless.

2) Therefore, I tried to configure with:

--with-libnuma \
--with-tm \
--with-openib \

Note that no directory is being pointed to.
My hope was that configure would find the libraries and includes  in
standard places (and hopefully the correct libs, 64-bit, not 32-bit).


This *should* be fine.

This way configure completes with no apparent error.
However, I get this funny error on the make phase:

/bin/sh ../../../../libtool --tag=CC   --mode=link gcc  -DNDEBUG
-march=amdfam10 -O3 -finline-functions -funroll-loops -mfpmath=sse
-fno-strict-aliasing -pthread -fvisibility=hidden -module -avoid- version
-Lyes/lib -export-dynamic   -o libmca_maffinity_libnuma.la
maffinity_libnuma_component.lo maffinity_libnuma_module.lo -lnuma - lnsl
-lutil  -lm
../../../../libtool: line 4998: cd: yes/lib: No such file or directory
libtool: link: cannot determine absolute directory name of `yes/lib'
make[2]: *** [libmca_maffinity_libnuma.la] Error 1


Huh.  That's odd.

Note the "yes/lib" path.

A little grep on config.log showed why the error:

%grep yes config.log

...

OMPI_WRAPPER_EXTRA_LDFLAGS='  -Lyes/lib  '
OPAL_WRAPPER_EXTRA_LDFLAGS='-Lyes/lib  '
ORTE_WRAPPER_EXTRA_LDFLAGS=' -Lyes/lib  '
WRAPPER_EXTRA_LDFLAGS='  -Lyes/lib  '
maffinity_libnuma_CPPFLAGS=' -Iyes/include'
maffinity_libnuma_LDFLAGS=' -Lyes/lib'
#define WRAPPER_EXTRA_LDFLAGS "  -Lyes/lib  "

Is this an internal "yes" answer to configure that
is being inadvertently caught/interpreted as a directory name?


Ah, crud.  Probably so, yes.

(/me double checks libnuma's m4 setup... crud; I can replicate the problem. I'll try to commit a fix this afternoon so that it can be included in 1.3.2)

Since configure seems to have found the libraries and include files,
and completed without error,
shouldn't it also have reported the correct paths to config.log
and written them correctly to the Makefiles?

3) Finally I tried this:

--with-libnuma=/usr \
--with-tm=/usr \
--with-openib=/usr \

This approach was suggested by Prentice Bisbal a few days ago,
when Francesco Pietra reported on this list
having a similar problem with libnuma.

This seems to work fine, and OpenMPI 1.3.1 builds.


Good. FWIW, you probably don't need to specify any of these. More below.

Generally, unless you specify --without-<foo>, OMPI will look for feature <foo> in the default paths. If the feature is found, then OMPI uses it. If the feature is not found, OMPI just skips it. Specifying --with-<foo> is supposed to indicate to OMPI's configure "yes, I definitely want this feature" (regardless of whether you specified a directory or not), meaning that if OMPI can't find that feature, configure will abort on the rationale that you specifically asked for something but we can't deliver it. So abort and let a human figure it out.

However, I have more questions:


Here's the general scheme that OMPI's configure uses:

- if --without-<foo> is specified, OMPI's configure doesn't look for feature <foo> and just skips it - if neither --with-<foo> nor --without-<foo> are specified, OMPI looks for feature <foo>. If the feature is found, use it. If not, skip it. - if --with-<foo> is specified (with or without a directory), OMPI looks for feature <foo>. If the feature is not found, abort configure on the rationale that you specifically asked for a feature that configure can't deliver, so abort and let a human figure it out. - if --with-<foo> is specified (without a directory), OMPI should look for the feature in the default compiler/linker paths - if --with-<foo>=directory is specified, OMPI should look for the feature in the specified compiler/linker paths, and abort if it can't find those paths

The last part ("abort if it can't find those paths") was added in v1.3 because some users were specifying --with-<foo>=/some/nonexistent/path and still having configure succeed by accidentally using some system- default version of <foo> rather than a specific version of <foo> that was installed in a non-default location. This caused no end of confusion until they realized that they had a typo in the directory name specified to --with-<foo>=<dir>. Then OMPI got blamed. :-) So we added sanity checks to ensure that the directories that are specified and that we look for that are derived from the specified directories actually exist.

Does that help?

A)Is the directory name mandatory or optional in the options above?
I.e. is "--with-libnuma" only OK, or do I have to use
"--with-libnuma=/some/directory"?


It should be optional.

The results in 2) above suggest that configure finds the libraries and
includes correctly, but that it writes wrong Makefiles,
and doesn't report any error either.


There's likely a bug in our --with-libnuma handling that is taking the default value from configure ("yes") and treating it as a directory instead of just an indicator that you want libnuma support. I'll fix it.

B) Is the syntax in 3) above the only correct possibility?


You should be able to leave off all those --with options, but then OMPI's configure will happily trundle through if it *doesn't* find those 3 features. So option 3) is definitely safest because OMPI's configure will abort if it doesn't find them (leaving you with an unexpectedly feature-poor OMPI installation).

C) If it is, can I rest assured that configure and make
will find the right 64-bit libraries, not 32-bit libraries
of similar name?


OMPI will only successfully link against the Right libraries for whatever flavor you're building. If you have told your compiler to build 64 bit versions of OMPI (or your compiler simply defaults to 64 bit), then the linker will only allow OMPI to link successfully against the 64 bit libraries (in Linux; in other OS's, it may be different -- such as OS X).

I ask because I have /usr/lib/libnuma.so.1 (32-bit ELF),
and /usr/lib64/libnuma.so.1 (64-bit ELF), and both are in the
same /usr directory that I gave to configure (--with-libnuma=/usr).
(Well, maybe this is deferred to the compiler to decide,
whether it is a 64- or 32-bit compiler, as somehow it seemed to work.)



Yep, we try both <dir>/lib and <dir>/lib64 when directories are specified. Also, both /usr/lib and /usr/lib64 are likely in your linker's default search path. So even if you hadn't specified --with- libnuma=/usr, then the default linker search path would have found / usr/lib64/libnuma.so.1.

--
Jeff Squyres
Cisco Systems

Reply via email to