Jeff, Thanks for the reply. I have gotten much closer, and it looks like all wounds were self-inflicted. More below.
On 9 October 2007 at 22:01, Jeff Squyres wrote: | On Oct 9, 2007, at 3:50 PM, Dirk Eddelbuettel wrote: | | > edd@ron:~$ orterun -n 2 --mca mca_component_show_load_errors 1 r -e | > 'library(Rmpi); print(mpi.comm.rank(0))' | > [ron:18360] mca: base: component_find: unable to open osc pt2pt: | > file not found (ignored) | > [ron:18361] mca: base: component_find: unable to open osc pt2pt: | > file not found (ignored) | | Truly odd. Looking in the code, this error message is displayed when | lt_dlopen() of the component fails for some reason (the Libtool | portable wrapper library around dlopen() and friends). We print out | the error string that libltdl returns to us, and it's apparently | "file not found". This *usually* refers to the fact that a | dependency of the DSO that we're trying to open wasn't found (not | that the DSO itself wasn't found). | | Your list of ldd dependencies didn't show anything odd, so I can't | imagine why it would get a "file not found" kind of error. | | An off the wall question: are you compiling / building Open MPI on | one system and running it on another, where perhaps the dependencies | are slightly different and therefore causing a failure? This is a | pretty weak question to ask, because I assume that *many* OMPI | components would fail to open if this were the case, but I thought | I'd ask anyway... It's a fair question, but the Debian dependencies are usually good enough. [ The answer is 'yes and no' as I build what gets onto Debian's mirrors, but using a standardised chroot whereas I then run it on my normal system. So the the same-yet-different machine. And there can be differences, but this is typically caught by the package management layer. ] | Another whacky question: does the error happen when you start your | test program manually (without mpirun)? That made no difference. | Does this happen for all MPI programs (potentially only those that | use the MPI-2 one-sided stuff), or just your R environment? This is the likely winner. It seems indeed due to R's Rmpi package. Running a simple mpitest.c shows no error message. We will look at the Rmpi initialization to see what could cause this. | At this point, all I can suggest is firing up a debugger and stepping | through the code in ld_dlopenext() to see why exactly it is failing. Seems like I avoided that trip to the dentist. ;-) Moreover, despite my attempts at checking and double checking, my apparent 'works on Debian but not on Ubuntu' was due to a LAM / OpenMPI mix on my Ubuntu machine at work. Sorry, that was another false alarm. | Sorry I don't have a better suggestion than this... :-\ You were spot-on and most helpful. Thanks a bunch. Cheers, Dirk -- Three out of two people have difficulties with fractions.