Err... I'm a little confused. We've been emailing about this exact issue for a week or two (off list); you just re-started the conversation from the beginning, moved it to the user's list, and dropped all the CC's (which include several people who are not on this list). Why did you do that?

Here's what I said in my last mail on that thread (just a few hours ago); it was in response to a mail from Thomas:

I am totally confused by your explanation; you are throwing around terms like VampirServer, vgnd, driver, ... I don't know what these things are nor do I understand your explanation of how they relate to each other. You seem to be using terms to define other terms that then are used to define the original terms. This is where I get lost.

Can you send a simple example that doesn't work, preferably outside of the whole Vampir system? Perhaps something that effectively mimics Vampir's behavior?



On Feb 4, 2009, at 12:03 PM, Kiril Dichev wrote:

Hi guys,

sorry for the long e-mail.

I have been trying for some time now to run VampirServer with shared
libs for Open MPI 1.3.

First of all: The "--enable-static --disable-shared" version works.
Also, the 1.2 series worked fine with the shared libs.

But here is the story for the shared libraries with OMPI 1.3:
Compilation of OMPI went fine and also the VampirServer guys compiled
the MPI driver they need against OMPI. The driver just refers to the
shared libraries of Open MPI.

However, on launching the server I got errors of the type "undefined
symbol":

error: /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3- intel10.1-64bit-MT-shared/lib/openmpi/mca_paffinity_linux.so:
undefined symbol: mca_base_param_reg_int

It seemed to me that probably my LD_LIBRARY_PATH is not including
<MPI_INSTALL>/lib/openmpi , but I exported it and did "mpirun -x
LD_LIBRARY_PATH ..." and nothing changed.

Then, I started building any component complaining with "undefined
symbol" with "--enable-mca-static" - for example the above message
disappeared after I did --enable-mca-static paffinity. I don't know why
this worked, but it seemed to help. However, it was always replaced by
another error message of another component.

After a few components another error came

mca: base: component_find: unable to
open /home_nfs/parma/x86_64/UNITE/packages/openmpi/1.3- intel10.1-64bit-MT-shared/lib/openmpi/mca_rml_oob: file not found (ignored)

(full output attached)

Now, I was unsure what to do, but again, when compiling the complaining component statically, things went a step further. One thing that struck
me is that there is such a file with an extra ".so" at the end in the
directory -but maybe dlopen also accepts files without the ".so", I
don't know.


Anywas, now I have included like 20 components statically and still
build shared objects for the OMPI libs and things seem to work.

Does anyone have any idea why these dozens of errors happen when loading
shared libs? Like I said, I never had this in 1.2 series.


Thanks,
Kiril


<mpirun-vngd.out>_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to