Brian,

Man you're good!  :)

On 10 October 2007 at 13:49, Brian Barrett wrote:
| On Oct 10, 2007, at 1:27 PM, Dirk Eddelbuettel wrote:
| > | Does this happen for all MPI programs (potentially only those that
| > | use the MPI-2 one-sided stuff), or just your R environment?
| >
| > This is the likely winner.
| >
| > It seems indeed due to R's Rmpi package. Running a simple mpitest.c  
| > shows no
| > error message. We will look at the Rmpi initialization to see what  
| > could
| > cause this.
| 
| Does rmpi link in libmpi.so or dynamically load it at run-time?  The  

The extension mechanism for the GNU R environment loads at run-time. This is
used by literally hundreds of packages on the CRAN mirrors.

| pt2pt one-sided component uses the MPI-1 point-to-point calls for  
| communication (hence, the pt2pt name). If those symbols were  
| unavailable (say, because libmpi.so was dynamically loaded) I could  
| see how this would cause problems.
| 
| The pt2pt component (rightly) does not have a -lmpi in its link  
| line.  The other components that use symbols in libmpi.so (wrongly)  
| do  have a -lmpi in their link line.  This can cause some problems on  
| some platforms (Linux tends to do dynamic linking / dynamic loading  
| better than most).  That's why only the pt2pt component fails.
| 
| My guess is that Rmpi is dynamically loading libmpi.so, but not  
| specifying the RTLD_GLOBAL flag.  This means that libmpi.so is not  

Spot on. Hao, Rmpi's author, alerted me run the Open MPI FAQ item 24 and
suggested the following patch which appears to have solved the issue

--- rmpi-0.5-4.orig/src/Rmpi.c
+++ rmpi-0.5-4/src/Rmpi.c
@@ -16,6 +16,7 @@
  */

 #include "Rmpi.h"
+#include <dlfcn.h>

 static MPI_Comm        *comm;
 static MPI_Status *status;
@@ -32,7 +33,9 @@
 if (flag)
                return AsInt(1);
        else {  
-               MPI_Init((void *)0,(void *)0);
+               char *libm="libmpi.so";
+               dlopen(libm,RTLD_GLOBAL);
+               MPI_Init((void *)0,(void *)0);
                MPI_Errhandler_set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
                MPI_Errhandler_set(MPI_COMM_SELF, MPI_ERRORS_RETURN);
                comm=(MPI_Comm *)Calloc(COMM_MAXSIZE, MPI_Comm); 

| available to the components the way it should be, and all goes  
| downhill from there.  It only mostly works because we do something  
| silly with how we link most of our components, and Linux is just  
| smart enough to cover our rears (thankfully).
| 
| Solutions:
| 
|    - Someone could make the pt2pt osc component link in libmpi.so
|      like the rest of the components and hope that no one ever
|      tries this on a non-friendly platform.
|    - Debian (and all Rmpi users) could configure Open MPI with the
|       --disable-dlopen flag and ignore the problem.
|    - Someone could fix Rmpi to dlopen libmpi.so with the RTLD_GLOBAL
|      flag and fix the problem properly.
| 
| I think it's clear I'm in favor of Option 3.

And I think Rmpi's autor agrees with you :) This also more or less answers
the question I lobbed at Hao a few minutes ago when I was puzzled why Open
MPI needs when so many other packages / libraries load cleanly into R.

Many, many thanks!

Dirk, much happier

-- 
Three out of two people have difficulties with fractions.

Reply via email to