> That's the same for R. We don;t touch the innert guts of module loading for > this . What Hao realized after looking at the corresponding FAQ item was that > right before calling MPI_Init, one can load libmpi explicitly, and -- and > that;s the important bit -- set the proper RTLD_GLOBAL argument. > > So you could adapt the patch we used : > > a) add an include for dlfcn.h > > b) explicitly call dlopen on libmpi.so with RTLD_GLOBAL > > That should be reasonably easy to test as you only need to rebuild mpi4py,
I don't like this solution one bit. Here is why. When someone needs to use a shared library in a given piece of code there are 2 options: 1. Link in the shared library at compile time. 2. Load it using dlopen. What you are telling me is that to use libmpi, I need to do both of these!! Am I not correct that this is an abuse of dlopen? Anyone should be able to link to libmpi at compile time and things shoud "just work" - rergardless of how my binary file is being used (my binary file could be linked in at compile time or itself loaded using dlopen). While I agree that the hack would probably solve the problem for mpi4py, I don't think this is a true solution to the problem. Brian > > --- rmpi-0.5-4.orig/src/Rmpi.c > +++ rmpi-0.5-4/src/Rmpi.c > @@ -16,6 +16,7 @@ > */ > > #include "Rmpi.h" > +#include <dlfcn.h> > > static MPI_Comm *comm; > static MPI_Status *status; > @@ -32,7 +33,9 @@ > if (flag) > return AsInt(1); > else { > - MPI_Init((void *)0,(void *)0); > + char *libm="libmpi.so"; > + dlopen(libm,RTLD_GLOBAL); > + MPI_Init((void *)0,(void *)0); > MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN); > MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN); > comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm); > > > | is responsible for loading _everything_ into Python, and I am pretty > | sure that there is no way that people would be willing to change it. > | I am cc'ing this to Lisandro - maybe he has some ideas on this front. > > Actually, looked like you didn't CC him. > > Hth, Dirk > > | > | Thanks > | > | Brian > | > | On 10/10/07, Brian Barrett <brbar...@open-mpi.org> wrote: > | > On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote: > | > > | Does this happen for all MPI programs (potentially only those that > | > > | use the MPI-2 one-sided stuff), or just your R environment? > | > > > | > > This is the likely winner. > | > > > | > > It seems indeed due to R's Rmpi package. Running a simple mpitest.c > | > > shows no > | > > error message. We will look at the Rmpi initialization to see what > | > > could > | > > cause this. > | > > | > Does rmpi link in libmpi.so or dynamically load it at run-time? The > | > pt2pt one-sided component uses the MPI-1 point-to-point calls for > | > communication (hence, the pt2pt name). If those symbols were > | > unavailable (say, because libmpi.so was dynamically loaded) I could > | > see how this would cause problems. > | > > | > The pt2pt component (rightly) does not have a -lmpi in its link > | > line. The other components that use symbols in libmpi.so (wrongly) > | > do have a -lmpi in their link line. This can cause some problems on > | > some platforms (Linux tends to do dynamic linking / dynamic loading > | > better than most). That's why only the pt2pt component fails. > | > > | > My guess is that Rmpi is dynamically loading libmpi.so, but not > | > specifying the RTLD_GLOBAL flag. This means that libmpi.so is not > | > available to the components the way it should be, and all goes > | > downhill from there. It only mostly works because we do something > | > silly with how we link most of our components, and Linux is just > | > smart enough to cover our rears (thankfully). > | > > | > Solutions: > | > > | > - Someone could make the pt2pt osc component link in libmpi.so > | > like the rest of the components and hope that no one ever > | > tries this on a non-friendly platform. > | > - Debian (and all Rmpi users) could configure Open MPI with the > | > --disable-dlopen flag and ignore the problem. > | > - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL > | > flag and fix the problem properly. > | > > | > I think it's clear I'm in favor of Option 3. > | > > | > Brian > | > _______________________________________________ > | > users mailing list > | > us...@open-mpi.org > | > http://www.open-mpi.org/mailman/listinfo.cgi/users > | > > | _______________________________________________ > | users mailing list > | us...@open-mpi.org > | http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > Three out of two people have difficulties with fractions. > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >