On Oct 11, 2008, at 6:48 AM, Aleksej Saushev wrote:
The actual message states:
[asau.local:25752] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init_stage1.c at line 182
--------------------------------------------------------------------------
Hmm. Even with all your output, I still don't see what could be
causing this -- the oob rml plugin was compiled and installed just
fine. Do you see an oob rml line in the output of ompi_info?
Is there a chance that there's some dependent library of oob_rml that
is available on your head/build node, but not available on your back-
end nodes? (that would be pretty odd, though)
Bummer -- it looks like we have a bug in the debugging output for when
rml plugins are selected -- so I can't just give you an mpirun command
line that will output some additional diagnostic information. Do you
mind getting your hands dirty in a little code? If so, edit this
file: orte/mca/rml/base/rml_base_select.c and change all instances of
opal_output_verbose(xxx, orte_rml_base.rml_output, ...)
to
opaL_output(orte_rml_base.rml_output, ...)
And then compile/install that with (this is a shortcut; of course, you
can do a top-level "make install" to install it, but it's a bit
overkill for what we need for this bit):
cd orte/rml
make
cd ../..
make install-am
Then run with:
mpirun --mca rml_base_debug 100 ...
And see what the output tells you. When I do this with a successful
run, my output looks like this:
----
[5:38] svbu-mpi:~/mpi % mpirun -np 1 --mca rml_base_debug 100 hello
[svbu-mpi.cisco.com:02087] orte_rml_base_select: initializing rml
component oob
[svbu-mpi030:10587] orte_rml_base_select: initializing rml component oob
stdout: Hello, world! I am 0 of 1 (svbu-mpi030)
stderr: Hello, world! I am 0 of 1 (svbu-mpi030)
[5:39] svbu-mpi:~/mpi %
-----
(my "hello" program simply prints out the hello world message on both
stdout/stderr)
Additional information.
pkgsrc framework does work correctly here, it even catches or
overrides some incompatibilities, when building OpenMPI from the
same tarball without pkgsrc framework, I get this:
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I../../../../opal/
include -I../../../../orte/include -I../../../../ompi/include -
I../../../.. -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -
pthread -MT backtrace_none_component.lo -MD -MP -MF .deps/
backtrace_none_component.Tpo -c backtrace_none_component.c -fPIC -
DPIC -o .libs/backtrace_none_component.o
backtrace_none_component.c:41: error: expected expression before ','
token
backtrace_none_component.c:51: warning: braces around scalar
initializer
backtrace_none_component.c:51: warning: (near initialization for
'mca_backtrace_none_component
.backtracec_version.mca_component_release_version')
That's also odd. I don't see any problems in the source code in this
particular area. What is the output of this area of the code when
compiled with -E? It should show some obvious problem.
--
Jeff Squyres
Cisco Systems