Err... I'm a little confused. We've been emailing about this exact
issue for a week or two (off list); you just re-started the
conversation from the beginning, moved it to the user's list, and
dropped all the CC's (which include several people who are not on this
list). Why did you do that?
Here's what I said in my last mail on that thread (just a few hours
ago); it was in response to a mail from Thomas:
I am totally confused by your explanation; you are throwing around
terms like VampirServer, vgnd, driver, ... I don't know what these
things are nor do I understand your explanation of how they relate to
each other. You seem to be using terms to define other terms that
then are used to define the original terms. This is where I get lost.
Can you send a simple example that doesn't work, preferably outside of
the whole Vampir system? Perhaps something that effectively mimics
Vampir's behavior?
On Feb 4, 2009, at 12:03 PM, Kiril Dichev wrote:
Hi guys,
sorry for the long e-mail.
I have been trying for some time now to run VampirServer with shared
libs for Open MPI 1.3.
First of all: The "--enable-static --disable-shared" version works.
Also, the 1.2 series worked fine with the shared libs.
But here is the story for the shared libraries with OMPI 1.3:
Compilation of OMPI went fine and also the VampirServer guys compiled
the MPI driver they need against OMPI. The driver just refers to the
shared libraries of Open MPI.
However, on launching the server I got errors of the type "undefined
symbol":
error: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-
intel10.1-64bit-MT-shared/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int
It seemed to me that probably my LD_LIBRARY_PATH is not including
<MPI_INSTALL>/lib/openmpi , but I exported it and did "mpirun -x
LD_LIBRARY_PATH ..." and nothing changed.
Then, I started building any component complaining with "undefined
symbol" with "--enable-mca-static" - for example the above message
disappeared after I did --enable-mca-static paffinity. I don't know
why
this worked, but it seemed to help. However, it was always replaced by
another error message of another component.
After a few components another error came
mca: base: component_find: unable to
open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3-
intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found
(ignored)
(full output attached)
Now, I was unsure what to do, but again, when compiling the
complaining
component statically, things went a step further. One thing that
struck
me is that there is such a file with an extra ".so" at the end in the
directory -but maybe dlopen also accepts files without the ".so", I
don't know.
Anywas, now I have included like 20 components statically and still
build shared objects for the OMPI libs and things seem to work.
Does anyone have any idea why these dozens of errors happen when
loading
shared libs? Like I said, I never had this in 1.2 series.
Thanks,
Kiril
<mpirun-vngd.out>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems