Jeff,

Thanks for the reply.  I have gotten much closer, and it looks like all
wounds were self-inflicted.  More below.

On 9 October 2007 at 22:01, Jeff Squyres wrote:
| On Oct 9, 2007, at 3:50 PM, Dirk Eddelbuettel wrote:
| 
| > edd@ron:~$ orterun -n 2 --mca mca_component_show_load_errors 1 r -e  
| > 'library(Rmpi); print(mpi.comm.rank(0))'
| > [ron:18360] mca: base: component_find: unable to open osc pt2pt:  
| > file not found (ignored)
| > [ron:18361] mca: base: component_find: unable to open osc pt2pt:  
| > file not found (ignored)
| 
| Truly odd.  Looking in the code, this error message is displayed when  
| lt_dlopen() of the component fails for some reason (the Libtool  
| portable wrapper library around dlopen() and friends).  We print out  
| the error string that libltdl returns to us, and it's apparently  
| "file not found".  This *usually* refers to the fact that a  
| dependency of the DSO that we're trying to open wasn't found (not  
| that the DSO itself wasn't found).
| 
| Your list of ldd dependencies didn't show anything odd, so I can't  
| imagine why it would get a "file not found" kind of error.
| 
| An off the wall question: are you compiling / building Open MPI on  
| one system and running it on another, where perhaps the dependencies  
| are slightly different and therefore causing a failure?  This is a  
| pretty weak question to ask, because I assume that *many* OMPI  
| components would fail to open if this were the case, but I thought  
| I'd ask anyway...

It's a fair question, but the Debian dependencies are usually good enough.  [
The answer is 'yes and no' as I build what gets onto Debian's mirrors, but
using a standardised chroot whereas I then run it on my normal system. So the
the same-yet-different machine. And there can be differences, but this is
typically caught by the package management layer. ]

| Another whacky question: does the error happen when you start your  
| test program manually (without mpirun)?

That made no difference.

| Does this happen for all MPI programs (potentially only those that  
| use the MPI-2 one-sided stuff), or just your R environment?

This is the likely winner. 

It seems indeed due to R's Rmpi package. Running a simple mpitest.c shows no
error message. We will look at the Rmpi initialization to see what could
cause this.

| At this point, all I can suggest is firing up a debugger and stepping  
| through the code in ld_dlopenext() to see why exactly it is failing.   

Seems like I avoided that trip to the dentist. ;-)

Moreover, despite my attempts at checking and double checking, my apparent
'works on Debian but not on Ubuntu' was due to a LAM / OpenMPI mix on my
Ubuntu machine at work.  Sorry, that was another false alarm.

| Sorry I don't have a better suggestion than this...  :-\

You were spot-on and most helpful. Thanks a bunch.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.

Reply via email to