Well, it turns out that the path OpenMPI looks for things seems at
least partially hard-coded.  I've got some "wierd pathing" here on my
rocks cluster:

/opt is local;
/share/apps is exported from the headnode and available on all nodes.
On the head node, /opt is symlinked to /share/apps

I set my environment modules such that openmpi-1.2.6 is located in
/share/apps/openmpi-pgi/1.2.6.  However, when I ran it on a compute
node, it ran into that error.  When I installed the runtime directly
on the compute node (placing it in /opt), but still left the
module/pathing the same, it worked.  I am thinking about making /opt a
symlink across the cluster, but I'm not sure about all the
implications therein...

--Jim

On Fri, May 23, 2008 at 12:07 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> On May 22, 2008, at 12:52 PM, Jim Kusznir wrote:
>
>> I installed openmpi 1.2.6 on my system, but now my users are
>> complaining about even more errors.  I'm getting this:
>>
>> [compute-0-23.local:26164] [NO-NAME] ORTE_ERROR_LOG: Not found in file
>> runtime/orte_init_stage1.c at line 182
>> --------------------------------------------------------------------------
>> Sorry!  You were supposed to get help about:
>>    orte_init:startup:internal-failure
>> from the file:
>>    help-orte-runtime
>> But I couldn't find any file matching that name.  Sorry!
>> --------------------------------------------------------------------------
>
> Everything below this message is a consequence of the first message
> (above).
>
> There's two problems here:
>
> 1. Where are the help files -- why can't OMPI find them?  That's
> really weird; it suggests a broken Open MPI install.  You have a few
> pending e-mails to me about RPM builds that I need to go read (I'm
> sorry; I'm way backed up :-( ); I wonder if this is somehow related...?
>
> 2. The specific error that is occurring is that the ORTE layer in OMPI
> is unable to initialize its out-of-band messaging system (we call it
> the "RML") which is *really* weird.  The only reason that I can think
> that that would occur is a broken OMPI install.
>
> Is there any chance that there are some files missing from your OMPI
> installs?  For example, do you see these two files under $prefix/lib/
> openmpi (or wherever $pkglibdir was set to):
>
> mca_rml_oob.la*
> mca_rml_oob.so*
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to