To make sure you don't use any "leftover" from another system install
when upgrading, you should specify --enable-prefix-by-default when
configuring the source tree for compilation. This will always select
the binaries and libs that are part of the mpirun you are using.
Aurelien
Le 22 déc. 08 à 18:17, Ralph Castain a écrit :
Your backend nodes are mistakenly picking up the OMPI 1.2 orted
binary instead of the 1.3 orted. The two are not compatible.
Check your LD_LIBRARY_PATH and PATH on the backend nodes and ensure
they are pointing at the 1.3 installation. There are other ways as
well of pointing to the correct installation - check the OMPI FAQ
pages to find alternatives if this doesn't work for you.
Ralph
On Dec 22, 2008, at 2:58 PM, Ray Muno wrote:
We have been happily running under OpenMPI 1.2 on our cluster
unitil recently. It is 2200 processors (8 way Opteron) , Qlogic IB
connected.
We have had issues starting larger jobs (600+ processors). There
seemed to be some indication that OpenMPI may solve our problems.
It built with no problem and installed. Users can compile programs.
When they tried to run, they got the attached output. Are we
missing something obvious?
This is a Rocks cluster with jobs scheduled through SGE.
=====================================================
$ mpirun -np 1024 program
[compute-2-6.local:32580] Error: unknown option "--daemonize"
Usage: orted [OPTION]...
Start an Open RTE Daemon
--bootproxy <arg0> Run as boot proxy for <job-id>
-d|--debug Debug the OpenRTE
-d|--spin Have the orted spin until we can connect a
debugger
to it
--debug-daemons Enable debugging of OpenRTE daemons
--debug-daemons-file Enable debugging of OpenRTE daemons, storing
output
in files
--gprreplica <arg0> Registry contact information.
-h|--help This help message
--mpi-call-yield <arg0>
Have MPI (or similar) applications call
yield when
idle
--name <arg0> Set the orte process name
--no-daemonize Don't daemonize into the background
--nodename <arg0> Node name as specified by host/resource
description.
--ns-nds <arg0> set sds/nds component to use for daemon
(normally
not needed)
--nsreplica <arg0> Name service contact information.
--num_procs <arg0> Set the number of process in this job
--persistent Remain alive after the application process
completes
--report-uri <arg0> Report this process' uri on indicated pipe
--scope <arg0> Set restrictions on who can connect to this
universe
--seed Host replicas for the core universe services
--set-sid Direct the orted to separate from the current
session
--tmpdir <arg0> Set the root for the session directory tree
--universe <arg0> Set the universe name as
username@hostname:universe_name for this
application
--vpid_start <arg0> Set the starting vpid for this job
--------------------------------------------------------------------------
A daemon (pid 4151) died unexpectedly with status 251 while
attempting
to launch so we are aborting.
There may be more information reported by the environment (see
above).
This may be because the daemon was unable to find all the needed
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the
process
that caused that situation.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
compute-5-15.local - daemon did not report back when launched
compute-5-35.local - daemon did not report back when launched
compute-4-8.local - daemon did not report back when launched
compute-7-2.local - daemon did not report back when launched
compute-2-6.local - daemon did not report back when launched
compute-6-28.local - daemon did not report back when launched
compute-6-35.local - daemon did not report back when launched
compute-6-25.local
compute-6-26.local
compute-2-19.local - daemon did not report back when launched
compute-6-37.local - daemon did not report back when launched
compute-6-12.local - daemon did not report back when launched
compute-2-36.local - daemon did not report back when launched
compute-7-5.local - daemon did not report back when launched
compute-7-23.local - daemon did not report back when launched
================================================
--
Ray Muno
University of Minnesota
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321