> That's the same for R. We don;t touch the innert guts of module loading for
> this . What Hao realized after looking at the corresponding FAQ item was that
> right before calling MPI_Init, one can load libmpi explicitly, and -- and
> that;s the important bit -- set the proper RTLD_GLOBAL argument.
>
> So you could adapt the patch we used :
>
>    a) add an include for dlfcn.h
>
>    b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL
>
> That should be reasonably easy to test as you only need to rebuild mpi4py,

I don't like this solution one bit.  Here is why.  When someone needs
to use a shared library in a given piece of code there are 2 options:

1.  Link in the shared library at compile time.

2.  Load it using dlopen.

What you are telling me is that to use libmpi, I need to do both of
these!!  Am I not correct that this is an abuse of dlopen?

Anyone should be able to link to libmpi at compile time and things
shoud "just work" - rergardless of how my binary file is being used
(my binary file could be linked in at compile time or itself loaded
using dlopen).

While I agree that the hack would probably solve the problem for
mpi4py, I don't think this is a true solution to the problem.

Brian

>
> --- rmpi-0.5-4.orig/src/Rmpi.c
> +++ rmpi-0.5-4/src/Rmpi.c
> @@ -16,6 +16,7 @@
>   */
>
>  #include "Rmpi.h"
> +#include <dlfcn.h>
>
>  static MPI_Comm        *comm;
>  static MPI_Status *status;
> @@ -32,7 +33,9 @@
>  if (flag)
>                 return AsInt(1);
>         else {
> -               MPI_Init((void *)0,(void *)0);
> +               char *libm="libmpi.so";
> +               dlopen(libm,RTLD_GLOBAL);
> +               MPI_Init((void *)0,(void *)0);
>                 MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
>                 MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
>                 comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm);
>
>
> | is responsible for loading _everything_ into Python, and I am pretty
> | sure that  there is no way that people would be willing to change it.
> | I am cc'ing this to Lisandro - maybe he has some ideas on this front.
>
> Actually, looked like you didn't CC him.
>
> Hth, Dirk
>
> |
> | Thanks
> |
> | Brian
> |
> | On 10/10/07, Brian Barrett <brbar...@open-mpi.org> wrote:
> | > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
> | > > | Does this happen for all MPI programs (potentially only those that
> | > > | use the MPI-2 one-sided stuff), or just your R environment?
> | > >
> | > > This is the likely winner.
> | > >
> | > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c
> | > > shows no
> | > > error message. We will look at the Rmpi initialization to see what
> | > > could
> | > > cause this.
> | >
> | > Does rmpi link in libmpi.so or dynamically load it at run-time?  The
> | > pt2pt one-sided component uses the MPI-1 point-to-point calls for
> | > communication (hence, the pt2pt name). If those symbols were
> | > unavailable (say, because libmpi.so was dynamically loaded) I could
> | > see how this would cause problems.
> | >
> | > The pt2pt component (rightly) does not have a -lmpi in its link
> | > line.  The other components that use symbols in libmpi.so (wrongly)
> | > do  have a -lmpi in their link line.  This can cause some problems on
> | > some platforms (Linux tends to do dynamic linking / dynamic loading
> | > better than most).  That's why only the pt2pt component fails.
> | >
> | > My guess is that Rmpi is dynamically loading libmpi.so, but not
> | > specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not
> | > available to the components the way it should be, and all goes
> | > downhill from there.  It only mostly works because we do something
> | > silly with how we link most of our components, and Linux is just
> | > smart enough to cover our rears (thankfully).
> | >
> | > Solutions:
> | >
> | >    - Someone could make the pt2pt osc component link in libmpi.so
> | >      like the rest of the components and hope that no one ever
> | >      tries this on a non-friendly platform.
> | >    - Debian (and all Rmpi users) could configure Open MPI with the
> | >       --disable-dlopen flag and ignore the problem.
> | >    - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
> | >      flag and fix the problem properly.
> | >
> | > I think it's clear I'm in favor of Option 3.
> | >
> | > Brian
> | > _______________________________________________
> | > users mailing list
> | > us...@open-mpi.org
> | > http://www.open-mpi.org/mailman/listinfo.cgi/users
> | >
> | _______________________________________________
> | users mailing list
> | us...@open-mpi.org
> | http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> --
> Three out of two people have difficulties with fractions.
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to