Well, it turns out that the path OpenMPI looks for things seems at least partially hard-coded. I've got some "wierd pathing" here on my rocks cluster:
/opt is local; /share/apps is exported from the headnode and available on all nodes. On the head node, /opt is symlinked to /share/apps I set my environment modules such that openmpi-1.2.6 is located in /share/apps/openmpi-pgi/1.2.6. However, when I ran it on a compute node, it ran into that error. When I installed the runtime directly on the compute node (placing it in /opt), but still left the module/pathing the same, it worked. I am thinking about making /opt a symlink across the cluster, but I'm not sure about all the implications therein... --Jim On Fri, May 23, 2008 at 12:07 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > On May 22, 2008, at 12:52 PM, Jim Kusznir wrote: > >> I installed openmpi 1.2.6 on my system, but now my users are >> complaining about even more errors. I'm getting this: >> >> [compute-0-23.local:26164] [NO-NAME] ORTE_ERROR_LOG: Not found in file >> runtime/orte_init_stage1.c at line 182 >> -------------------------------------------------------------------------- >> Sorry! You were supposed to get help about: >> orte_init:startup:internal-failure >> from the file: >> help-orte-runtime >> But I couldn't find any file matching that name. Sorry! >> -------------------------------------------------------------------------- > > Everything below this message is a consequence of the first message > (above). > > There's two problems here: > > 1. Where are the help files -- why can't OMPI find them? That's > really weird; it suggests a broken Open MPI install. You have a few > pending e-mails to me about RPM builds that I need to go read (I'm > sorry; I'm way backed up :-( ); I wonder if this is somehow related...? > > 2. The specific error that is occurring is that the ORTE layer in OMPI > is unable to initialize its out-of-band messaging system (we call it > the "RML") which is *really* weird. The only reason that I can think > that that would occur is a broken OMPI install. > > Is there any chance that there are some files missing from your OMPI > installs? For example, do you see these two files under $prefix/lib/ > openmpi (or wherever $pkglibdir was set to): > > mca_rml_oob.la* > mca_rml_oob.so* > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >