On Oct 11, 2008, at 6:48 AM, Aleksej Saushev wrote:

The actual message states:

[asau.local:25752] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
--------------------------------------------------------------------------

Hmm. Even with all your output, I still don't see what could be causing this -- the oob rml plugin was compiled and installed just fine. Do you see an oob rml line in the output of ompi_info?

Is there a chance that there's some dependent library of oob_rml that is available on your head/build node, but not available on your back- end nodes? (that would be pretty odd, though)

Bummer -- it looks like we have a bug in the debugging output for when rml plugins are selected -- so I can't just give you an mpirun command line that will output some additional diagnostic information. Do you mind getting your hands dirty in a little code? If so, edit this file: orte/mca/rml/base/rml_base_select.c and change all instances of

   opal_output_verbose(xxx, orte_rml_base.rml_output, ...)
to
   opaL_output(orte_rml_base.rml_output, ...)

And then compile/install that with (this is a shortcut; of course, you can do a top-level "make install" to install it, but it's a bit overkill for what we need for this bit):

   cd orte/rml
   make
   cd ../..
   make install-am

Then run with:

   mpirun --mca rml_base_debug 100 ...

And see what the output tells you. When I do this with a successful run, my output looks like this:

----
[5:38] svbu-mpi:~/mpi % mpirun -np 1 --mca rml_base_debug 100 hello
[svbu-mpi.cisco.com:02087] orte_rml_base_select: initializing rml component oob
[svbu-mpi030:10587] orte_rml_base_select: initializing rml component oob
stdout: Hello, world!  I am 0 of 1 (svbu-mpi030)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi030)
[5:39] svbu-mpi:~/mpi %
-----

(my "hello" program simply prints out the hello world message on both stdout/stderr)

Additional information.

pkgsrc framework does work correctly here, it even catches or
overrides some incompatibilities, when building OpenMPI from the
same tarball without pkgsrc framework, I get this:

libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/ include -I../../../../orte/include -I../../../../ompi/include - I../../../.. -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - pthread -MT backtrace_none_component.lo -MD -MP -MF .deps/ backtrace_none_component.Tpo -c backtrace_none_component.c -fPIC - DPIC -o .libs/backtrace_none_component.o backtrace_none_component.c:41: error: expected expression before ',' token backtrace_none_component.c:51: warning: braces around scalar initializer backtrace_none_component.c:51: warning: (near initialization for 'mca_backtrace_none_component .backtracec_version.mca_component_release_version')

That's also odd. I don't see any problems in the source code in this particular area. What is the output of this area of the code when compiled with -E? It should show some obvious problem.

--
Jeff Squyres
Cisco Systems

Reply via email to