I have a small cluster which until last week was just fine.
Unfortunately we were hit by a sudden power dip which brought the
cluster down and did significant damage to other servers (blew power
supplies and disk). Although the cluster machines and the Infiniband
link is up and running jobs I am now getting these errors in user
applications which we've never had before.
The system messages file reports (for node2):
Jul 5 12:08:28 node1 genunix: [ID 408789 kern.notice] NOTICE:
tavor0: fault cleared external to device; service available
Jul 5 12:08:28 node1 genunix: [ID 451854 kern.notice] NOTICE:
tavor0: port 1 up
Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info]
/pci@1,0/pci1022,7450@2/pci15b3,5a46@1/pci15b3,5a44@0 (tavor0) online
Jul 7 16:18:32 node1 ib: [ID 842868 kern.info] IB device:
daplt@0, daplt0
Jul 7 16:18:32 node1 genunix: [ID 936769 kern.info] daplt0 is /ib/
daplt@0
Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /ib/daplt@0
(daplt0) online
Jul 7 16:18:32 node1 genunix: [ID 834635 kern.info] /ib/daplt@0
(daplt0) multipath status: degraded, path
/pci@1,0/pci1022,7450@2/pci15
b3,5a46@1/pci15b3,5a44@0 (tavor0) to target address: daplt,0 is
online Load balancing: round-robin
I wonder if this messages are indicative of a hardware problem,
possibly on the Infiniband switch or the host adapters on the cluster
machines. The cluster software has not been altered but there have
been small changes to the application codes. But I want to rule out
hardware issues because of the power dip first.
Anyone seen this message before and know whether to investigate
hardware first? I did check the archives but it didn't help. More
info provided below.
Any help appreciate, thanks.
Glenn
--
Details:
Cluster uses mix of Sun's X4100/X4200 machines linked with Sun
supplied Infiniband and host adapters. All machines are running
Solaris 10_x86 (11/06) with latest kernel patches
Software is Sun Clustertools 7.
Node2 $ ifconfig ibd1
ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044
index 3
inet 192.168.50.202 netmask ffffff00 broadcast
192.168.50.255
Node1 $ ifconfig ibd1
ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044
index 3
inet 192.168.50.201 netmask ffffff00 broadcast
192.168.50.255
ompi_info -a
Open MPI: 1.2.1r14096-ct7b030r1838
Open MPI SVN revision: 0
Open RTE: 1.2.1r14096-ct7b030r1838
Open RTE SVN revision: 0
OPAL: 1.2.1r14096-ct7b030r1838
OPAL SVN revision: 0
MCA backtrace: printstack (MCA v1.0, API v1.0,
Component v1.2.1)
MCA paffinity: solaris (MCA v1.0, API v1.0, Component
v1.2.1)
MCA maffinity: first_use (MCA v1.0, API v1.0,
Component v1.2.1)
MCA timer: solaris (MCA v1.0, API v1.0, Component
v1.2.1)
MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
MCA allocator: bucket (MCA v1.0, API v1.0, Component
v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component
v1.2.1)
MCA coll: self (MCA v1.0, API v1.0, Component
v1.2.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component
v1.2.1)
MCA io: romio (MCA v1.0, API v1.0, Component
v1.2.1)
MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
MCA mpool: udapl (MCA v1.0, API v1.0, Component
v1.2.1)
MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
MCA btl: self (MCA v1.0, API v1.0.1, Component
v1.2.1)
MCA btl: sm (MCA v1.0, API v1.0.1, Component
v1.2.1)
MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
MCA btl: udapl (MCA v1.0, API v1.0, Component
v1.2.1)
MCA topo: unity (MCA v1.0, API v1.0, Component
v1.2.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component
v1.2.1)
MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
MCA errmgr: orted (MCA v1.0, API v1.3, Component
v1.2.1)
MCA errmgr: proxy (MCA v1.0, API v1.3, Component
v1.2.1)
MCA gpr: null (MCA v1.0, API v1.0, Component
v1.2.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component
v1.2.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component
v1.2.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component
v1.2.1)
MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
MCA ns: proxy (MCA v1.0, API v2.0, Component
v1.2.1)
MCA ns: replica (MCA v1.0, API v2.0, Component
v1.2.1)
MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA ras: dash_host (MCA v1.0, API v1.3,
Component v1.2.1)
MCA ras: gridengine (MCA v1.0, API v1.3,
Component v1.2.1)
MCA ras: localhost (MCA v1.0, API v1.3,
Component v1.2.1)
MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
MCA rds: hostfile (MCA v1.0, API v1.3, Component
v1.2.1)
MCA rds: proxy (MCA v1.0, API v1.3, Component
v1.2.1)
MCA rds: resfile (MCA v1.0, API v1.3, Component
v1.2.1)
MCA rmaps: round_robin (MCA v1.0, API v1.3,
Component v1.2.1)
MCA rmgr: proxy (MCA v1.0, API v2.0, Component
v1.2.1)
MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
MCA pls: gridengine (MCA v1.0, API v1.3,
Component v1.2.1)
MCA pls: proxy (MCA v1.0, API v1.3, Component
v1.2.1)
MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
MCA sds: pipe (MCA v1.0, API v1.0, Component
v1.2.1)
MCA sds: seed (MCA v1.0, API v1.0, Component
v1.2.1)
MCA sds: singleton (MCA v1.0, API v1.0,
Component v1.2.1)
Prefix: /opt/SUNWhpc/HPC7.0
Bindir: /opt/SUNWhpc/HPC7.0/bin
Libdir: /opt/SUNWhpc/HPC7.0/lib
Incdir: /opt/SUNWhpc/HPC7.0/include
Pkglibdir: /opt/SUNWhpc/HPC7.0/lib/openmpi
Sysconfdir: /opt/SUNWhpc/HPC7.0/etc
Configured architecture: i386-pc-solaris2.10
Configured by: root
Configured on: Fri Mar 30 13:40:12 EDT 2007
Configure host: burpen-csx10-0
Built by: root
Built on: Fri Mar 30 13:57:25 EDT 2007
Built host: burpen-csx10-0
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: trivial
C compiler: cc
C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
C char size: 1
C bool size: 1
C short size: 2
C int size: 4
C long size: 4
C float size: 4
C double size: 8
C pointer size: 4
C char align: 1
C bool align: 1
C int align: 4
C float align: 4
C double align: 4
C++ compiler: CC
C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
Fortran77 compiler: f77
Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
Fortran90 compiler: f95
Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
Fort integer size: 4
Fort logical size: 4
Fort logical value true: 1
Fort have integer1: yes
Fort have integer2: yes
Fort have integer4: yes
Fort have integer8: yes
Fort have integer16: no
Fort have real4: yes
Fort have real8: yes
Fort have real16: no
Fort have complex8: yes
Fort have complex16: yes
Fort have complex32: no
Fort integer1 size: 1
Fort integer2 size: 2
Fort integer4 size: 4
Fort integer8 size: 8
Fort integer16 size: -1
Fort real size: 4
Fort real4 size: 4
Fort real8 size: 8
Fort real16 size: -1
Fort dbl prec size: 4
Fort cplx size: 4
Fort dbl cplx size: 4
Fort cplx8 size: 8
Fort cplx16 size: 16
Fort cplx32 size: -1
Fort integer align: 4
Fort integer1 align: 1
Fort integer2 align: 2
Fort integer4 align: 4
Fort integer8 align: 4
Fort integer16 align: -1
Fort real align: 4
Fort real4 align: 4
Fort real8 align: 4
Fort real16 align: -1
Fort dbl prec align: 4
Fort cplx align: 4
Fort dbl cplx align: 4
Fort cplx8 align: 4
Fort cplx16 align: 4
Fort cplx32 align: -1
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: yes
Thread support: no
Build CFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -
xprefetch
-xprefetch_level=2 -xvector=simd
-xdepend=yes -xbuiltin=%all
-xO5
Build CXXFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -
xprefetch
-xprefetch_level=2 -xvector=simd
-xdepend=yes -xbuiltin=%all
-xO5
Build FFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch
-xprefetch_level=2
-xvector=simd -stackvar -xO5
Build FCFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch
-xprefetch_level=2
-xvector=simd -stackvar -xO5
Build LDFLAGS: -export-dynamic -R/opt/mx/lib
-R/opt/SUNWhpc/HPC7.0/lib
-R/opt/mx/lib/amd64 -R/opt/SUNWhpc/
HPC7.0/lib/amd64
-R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib
-R/opt/mx/lib/amd64
-R/opt/SUNWhpc/HPC7.0/lib/amd64 -R/opt/
mx/lib
-R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/
amd64
-R/opt/SUNWhpc/HPC7.0/lib/amd64
Build LIBS: -lsocket -lnsl -lrt -lm
Wrapper extra CFLAGS:
Wrapper extra CXXFLAGS:
Wrapper extra FFLAGS:
Wrapper extra FCFLAGS:
Wrapper extra LDFLAGS: -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/
lib
-R/opt/mx/lib/amd64
-R/opt/SUNWhpc/HPC7.0/lib/amd64
Wrapper extra LIBS: -lsocket -lnsl -lrt -lm -ldl
Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
Heterogeneous support: yes
mpirun default --prefix: yes
MCA mca: parameter "mca_param_files" (current
value:
"/home/tomcat/.openmpi/mca-params.conf:/opt/SUNWhpc/HPC7.0/etc/
openmpi-mca-par
ams.conf")
Path for MCA configuration files
containing
default parameter
values
MCA mca: parameter "mca_component_path" (current
value:
"/opt/SUNWhpc/HPC7.0/lib/openmpi:/home/tomcat/.openmpi/components")
Path where to look for Open MPI and
ORTE components
MCA mca: parameter "mca_verbose" (current value:
<none>)
Top-level verbosity parameter
MCA mca: parameter "mca_component_show_load_errors"
(current value: "0")
Whether to show errors for components that
failed to load or
not
MCA mca: parameter "mca_component_disable_dlopen"
(current value: "0")
Whether to attempt to disable opening
dynamic components or not
MCA mpi: parameter "mpi_param_check" (current
value: "1")
Whether you want MPI API parameters
checked
at run-time or not.
Possible values are 0 (no checking) and 1
(perform checking at
run-time)
MCA mpi: parameter
"mpi_yield_when_idle" (current value:
"0")
Yield the processor when waiting for MPI
communication (for MPI
processes, will default to 1 when
oversubscribing nodes)
MCA mpi: parameter
"mpi_event_tick_rate" (current value:
"-1")
How often to progress TCP
communications (0
= never, otherwise
specified in microseconds)
MCA mpi: parameter "mpi_show_handle_leaks" (current
value: "0")
Whether MPI_FINALIZE shows all MPI handles
that were not freed
or not
MCA mpi: parameter
"mpi_no_free_handles" (current value:
"0")
Whether to actually free MPI objects when
their handles are
freed
MCA mpi: parameter
"mpi_show_mca_params" (current value:
"0")
Whether to show all MCA parameter value
during MPI_INIT or not
(good for reproducability of MPI jobs)
MCA mpi: parameter "mpi_show_mca_params_file"
(current value: <none>)
If mpi_show_mca_params is true, setting
this string to a valid
filename tells Open MPI to dump all the
MCA
parameter values
into a file suitable for reading via the
mca_param_files
parameter (good for reproducability of
MPI jobs)
MCA mpi: parameter
"mpi_paffinity_alone" (current value:
"0")
If nonzero, assume that this job is the
only (set
of)
process(es) running on each node and bind
processes to
processors, starting with processor ID 0
MCA mpi: parameter "mpi_keep_peer_hostnames"
(current value: "1")
If nonzero, save the string hostnames of
all MPI peer processes
(mostly for error / debugging output
messages). This can add
quite a bit of memory usage to each MPI
process.
MCA mpi: parameter "mpi_abort_delay" (current
value: "0")
If nonzero, print out an identifying
message when MPI_ABORT is
invoked (hostname, PID of the process that
called MPI_ABORT) and
delay for that many seconds before exiting
(a negative delay
value means to never abort). This allows
attaching of a
debugger before quitting the job.
MCA mpi: information
"mpi_abort_print_stack" (value: "0")
If nonzero, print out a stack trace when
MPI_ABORT is invoked
MCA mpi: parameter "mpi_preconnect_all" (current
value: "0")
Whether to force MPI processes to create
connections / warmup
with *all* peers during MPI_INIT (vs.
making connections lazily
-- upon the first MPI traffic between each
process peer pair)
MCA mpi: parameter "mpi_preconnect_oob" (current
value: "0")
Whether to force MPI processes to fully
wire-up the OOB system
between MPI processes.
MCA mpi: parameter "mpi_leave_pinned" (current
value: "0")
Whether to use the "leave pinned" protocol
or not. Enabling
this setting can help bandwidth
performance
when repeatedly
sending and receiving large messages with
the same buffers over
RDMA-based networks.
MCA mpi: parameter "mpi_leave_pinned_pipeline"
(current value: "0")
Whether to use the "leave pinned pipeline"
protocol or not.
MCA orte: parameter "orte_debug" (current value:
"0")
Top-level ORTE debug switch
MCA orte: parameter "orte_no_daemonize" (current
value: "0")
Whether to properly daemonize the ORTE
daemons or
not
MCA orte: parameter "orte_base_user_debugger"
(current value: "totalview
@mpirun@ -a @mpirun_args@ : fxp
@mpirun@ -a
@mpirun_args@")
Sequence of user-level debuggers to search
for in orterun
MCA orte: parameter "orte_abort_timeout" (current
value:
"10")
Time to wait [in seconds] before giving up
on aborting an ORTE
operation
MCA orte: parameter "orte_timing" (current value:
"0")
Request that critical timing loops be
measured
MCA opal: parameter "opal_signal" (current value:
"6,10,8,11")
If a signal is received, display the stack
trace frame
MCA backtrace: parameter "backtrace" (current value:
<none>)
Default selection set of components for
the
backtrace framework
(<none> means "use all components that
can be
found")
MCA backtrace: parameter
"backtrace_base_verbose" (current
value: "0")
Verbosity level for the backtrace
framework
(0 = no verbosity)
MCA backtrace: parameter "backtrace_printstack_priority"
(current value: "0")
MCA memory: parameter "memory" (current value: <none>)
Default selection set of components for
the
memory framework
(<none> means "use all components that
can be
found")
MCA memory: parameter
"memory_base_verbose" (current value:
"0")
Verbosity level for the memory
framework (0
= no verbosity)
MCA paffinity: parameter "paffinity" (current value:
<none>)
Default selection set of components for
the
paffinity framework
(<none> means "use all components that
can be
found")
MCA paffinity: parameter "paffinity_solaris_priority"
(current value: "10")
Priority of the solaris paffinity
component
MCA maffinity: parameter "maffinity" (current value:
<none>)
Default selection set of components for
the
maffinity framework
(<none> means "use all components that
can be
found")
MCA maffinity: parameter "maffinity_first_use_priority"
(current value: "10")
Priority of the first_use maffinity
component
MCA timer: parameter "timer" (current value: <none>)
Default selection set of components for
the
timer framework
(<none> means "use all components that
can be
found")
MCA timer: parameter "timer_base_verbose" (current
value: "0")
Verbosity level for the timer framework (0
= no verbosity)
MCA timer: parameter
"timer_solaris_priority" (current
value: "0")
MCA allocator: parameter "allocator" (current value:
<none>)
Default selection set of components for
the
allocator framework
(<none> means "use all components that
can be
found")
MCA allocator: parameter
"allocator_base_verbose" (current
value: "0")
Verbosity level for the allocator
framework
(0 = no verbosity)
MCA allocator: parameter "allocator_basic_priority"
(current value: "0")
MCA allocator: parameter "allocator_bucket_num_buckets"
(current value: "30")
MCA allocator: parameter "allocator_bucket_priority"
(current value: "0")
MCA coll: parameter "coll" (current value: <none>)
Default selection set of components for
the
coll framework
(<none> means "use all components that
can be
found")
MCA coll: parameter "coll_base_verbose" (current
value: "0")
Verbosity level for the coll framework
(0 =
no verbosity)
MCA coll: parameter
"coll_basic_priority" (current value:
"10")
Priority of the basic coll component
MCA coll: parameter
"coll_basic_crossover" (current value:
"4")
Minimum number of processes in a
communicator before using the
logarithmic algorithms
MCA coll: parameter "coll_self_priority" (current
value:
"75")
MCA coll: parameter "coll_sm_priority" (current
value: "0")
Priority of the sm coll component
MCA coll: parameter "coll_sm_control_size" (current
value: "4096")
Length of the control data -- should
usually be either the
length of a cache line on most SMPs, or
the
size of a page on
machines that support direct memory
affinity page placement (in
bytes)
MCA coll: parameter "coll_sm_bootstrap_filename"
(current value:
"shared_mem_sm_bootstrap")
Filename (in the Open MPI session
directory) of the coll sm
component bootstrap rendezvous mmap file
MCA coll: parameter "coll_sm_bootstrap_num_segments"
(current value: "8")
Number of segments in the bootstrap file
MCA coll: parameter "coll_sm_fragment_size" (current
value: "8192")
Fragment size (in bytes) used for passing
data through shared
memory (will be rounded up to the nearest
control_size size)
MCA coll: parameter "coll_sm_mpool" (current
value: "sm")
Name of the mpool component to use
MCA coll: parameter "coll_sm_comm_in_use_flags"
(current value: "2")
Number of "in use" flags, used to mark a
message passing area
segment as currently being used or not
(must be >= 2 and <=
comm_num_segments)
MCA coll: parameter "coll_sm_comm_num_segments"
(current value: "8")
Number of segments in each communicator's
shared memory message
passing area (must be >= 2, and must be
a multiple
of
comm_in_use_flags)
MCA coll: parameter
"coll_sm_tree_degree" (current value:
"4")
Degree of the tree for tree-based
operations (must be => 1 and
<= min(control_size, 255))
MCA coll: information
"coll_sm_shared_mem_used_bootstrap" (value: "160")
Amount of shared memory used in the shared
memory bootstrap area
(in bytes)
MCA coll: parameter
"coll_sm_info_num_procs" (current
value: "4")
Number of processes to use for the
calculation of
the
shared_mem_size MCA information parameter
(must be => 2)
MCA coll: information "coll_sm_shared_mem_used_data"
(value: "548864")
Amount of shared memory used in the shared
memory data area for
info_num_procs processes (in bytes)
MCA coll: parameter
"coll_tuned_priority" (current value:
"30")
Priority of the tuned coll component
MCA coll: parameter
"coll_tuned_pre_allocate_memory_comm_size_limit"
(current value: "32768")
Size of communicator were we stop
pre-allocating memory for the
fixed internal buffer used for message
requests etc that is hung
off the communicator data segment. I.e. if
you have a 100'000
nodes you might not want to pre-allocate
200'000 request handle
slots per communicator instance!
MCA coll: parameter "coll_tuned_init_tree_fanout"
(current value: "4")
Inital fanout used in the tree topologies
for each communicator.
This is only an initial guess, if a tuned
collective needs a
different fanout for an operation, it
build
it dynamically. This
parameter is only for the first guess and
might save a little
time
MCA coll: parameter "coll_tuned_init_chain_fanout"
(current value: "4")
Inital fanout used in the chain (fanout
followed by pipeline)
topologies for each communicator. This is
only an initial guess,
if a tuned collective needs a different
fanout for an operation,
it build it dynamically. This parameter is
only for the first
guess and might save a little time
MCA coll: parameter "coll_tuned_use_dynamic_rules"
(current value: "0")
Switch used to decide if we use static
(compiled/if statements)
or dynamic (built at runtime) decision
function
rules
MCA io: parameter "io_base_freelist_initial_size"
(current value: "16")
Initial MPI-2 IO request freelist size
MCA io: parameter "io_base_freelist_max_size"
(current value: "64")
Max size of the MPI-2 IO request freelist
MCA io: parameter "io_base_freelist_increment"
(current value: "16")
Increment size of the MPI-2 IO request
freelist
MCA io: parameter "io" (current value: <none>)
Default selection set of components for
the
io framework (<none>
means "use all components that can be
found")
MCA io: parameter "io_base_verbose" (current
value: "0")
Verbosity level for the io framework (0 =
no verbosity)
MCA io: parameter "io_romio_priority" (current
value: "10")
Priority of the io romio component
MCA io: parameter "io_romio_delete_priority"
(current value: "10")
Delete priority of the io romio component
MCA io: parameter
"io_romio_enable_parallel_optimizations" (current
value: "0")
Enable set of Open MPI-added options to
improve collective file
i/o performance
MCA mpool: parameter "mpool" (current value: <none>)
Default selection set of components for
the
mpool framework
(<none> means "use all components that
can be
found")
MCA mpool: parameter "mpool_base_verbose" (current
value: "0")
Verbosity level for the mpool framework (0
= no verbosity)
MCA mpool: parameter "mpool_sm_allocator" (current
value: "bucket")
Name of allocator component to use with
sm mpool
MCA mpool: parameter "mpool_sm_max_size" (current
value: "536870912")
Maximum size of the sm mpool shared
memory file
MCA mpool: parameter "mpool_sm_min_size" (current
value: "134217728")
Minimum size of the sm mpool shared
memory file
MCA mpool: parameter
"mpool_sm_per_peer_size" (current
value: "33554432")
Size (in bytes) to allocate per local peer
in the sm mpool
shared memory file, bounded by min_size
and
max_size
MCA mpool: parameter "mpool_sm_priority" (current
value: "0")
MCA mpool: parameter
"mpool_udapl_priority" (current value:
"0")
MCA mpool: parameter "mpool_base_use_mem_hooks"
(current value: "0")
use memory hooks for deregistering
freed memory
MCA mpool: parameter
"mpool_use_mem_hooks" (current value:
"0")
(deprecated, use mpool_base_use_mem_hooks)
MCA pml: parameter "pml" (current value: <none>)
Default selection set of components for
the
pml framework
(<none> means "use all components that
can be
found")
MCA pml: parameter "pml_base_verbose" (current
value: "0")
Verbosity level for the pml framework (0 =
no verbosity)
MCA pml: parameter
"pml_cm_free_list_num" (current value:
"4")
Initial size of request free lists
MCA pml: parameter "pml_cm_free_list_max" (current
value: "-1")
Maximum size of request free lists
MCA pml: parameter "pml_cm_free_list_inc" (current
value: "64")
Number of elements to add when growing
request free lists
MCA pml: parameter "pml_cm_priority" (current
value: "30")
CM PML selection priority
MCA pml: parameter "pml_ob1_free_list_num" (current
value: "4")
MCA pml: parameter "pml_ob1_free_list_max" (current
value: "-1")
MCA pml: parameter "pml_ob1_free_list_inc" (current
value: "64")
MCA pml: parameter "pml_ob1_priority" (current
value: "20")
MCA pml: parameter "pml_ob1_eager_limit" (current
value: "131072")
MCA pml: parameter "pml_ob1_send_pipeline_depth"
(current value: "3")
MCA pml: parameter "pml_ob1_recv_pipeline_depth"
(current value: "4")
MCA bml: parameter "bml" (current value: <none>)
Default selection set of components for
the
bml framework
(<none> means "use all components that
can be
found")
MCA bml: parameter "bml_base_verbose" (current
value: "0")
Verbosity level for the bml framework (0 =
no verbosity)
MCA bml: parameter "bml_r2_show_unreach_errors"
(current value: "1")
Show error message when procs are
unreachable
MCA bml: parameter "bml_r2_priority" (current
value: "0")
MCA rcache: parameter "rcache" (current value: <none>)
Default selection set of components for
the
rcache framework
(<none> means "use all components that
can be
found")
MCA rcache: parameter
"rcache_base_verbose" (current value:
"0")
Verbosity level for the rcache
framework (0
= no verbosity)
MCA rcache: parameter "rcache_rb_priority" (current
value: "0")
MCA rcache: parameter "rcache_vma_mru_len" (current
value:
"256")
The maximum size IN ENTRIES of the MRU
(most recently used)
rcache list
MCA rcache: parameter "rcache_vma_mru_size" (current
value: "1073741824")
The maximum size IN BYTES of the MRU (most
recently used) rcache
list
MCA rcache: parameter
"rcache_vma_priority" (current value:
"0")
MCA btl: parameter "btl_base_debug" (current
value: "0")
If btl_base_debug is 1 standard debug is
output, if > 1 verbose
debug is output
MCA btl: parameter "btl" (current value: <none>)
Default selection set of components for
the
btl framework
(<none> means "use all components that
can be
found")
MCA btl: parameter "btl_base_verbose" (current
value: "0")
Verbosity level for the btl framework (0 =
no verbosity)
MCA btl: parameter
"btl_self_free_list_num" (current
value: "0")
Number of fragments by default
MCA btl: parameter
"btl_self_free_list_max" (current
value: "-1")
Maximum number of fragments
MCA btl: parameter
"btl_self_free_list_inc" (current
value: "32")
Increment by this number of fragments
MCA btl: parameter "btl_self_eager_limit" (current
value: "131072")
Eager size fragmeng (before the rendez-
vous
ptotocol)
MCA btl: parameter
"btl_self_min_send_size" (current
value: "262144")
Minimum fragment size after the rendez-
vous
MCA btl: parameter
"btl_self_max_send_size" (current
value: "262144")
Maximum fragment size after the rendez-
vous
MCA btl: parameter
"btl_self_min_rdma_size" (current value:
"2147483647")
Maximum fragment size for the RDMA
transfer
MCA btl: parameter
"btl_self_max_rdma_size" (current value:
"2147483647")
Maximum fragment size for the RDMA
transfer
MCA btl: parameter "btl_self_exclusivity" (current
value: "65536")
Device exclusivity
MCA btl: parameter "btl_self_flags" (current
value: "10")
Active behavior flags
MCA btl: parameter "btl_self_priority" (current
value: "0")
MCA btl: parameter
"btl_sm_free_list_num" (current value:
"8")
MCA btl: parameter "btl_sm_free_list_max" (current
value: "-1")
MCA btl: parameter "btl_sm_free_list_inc" (current
value: "64")
MCA btl: parameter "btl_sm_exclusivity" (current
value: "65535")
MCA btl: parameter "btl_sm_latency" (current
value: "100")
MCA btl: parameter "btl_sm_max_procs" (current
value: "-1")
MCA btl: parameter "btl_sm_sm_extra_procs" (current
value: "2")
MCA btl: parameter "btl_sm_mpool" (current
value: "sm")
MCA btl: parameter "btl_sm_eager_limit" (current
value: "4096")
MCA btl: parameter "btl_sm_max_frag_size" (current
value: "32768")
MCA btl: parameter "btl_sm_size_of_cb_queue"
(current value: "128")
MCA btl: parameter "btl_sm_cb_lazy_free_freq"
(current value: "120")
MCA btl: parameter "btl_sm_priority" (current
value: "0")
MCA btl: parameter "btl_tcp_if_include" (current
value: <none>)
MCA btl: parameter "btl_tcp_if_exclude" (current
value:
"lo")
MCA btl: parameter "btl_tcp_free_list_num" (current
value: "8")
MCA btl: parameter "btl_tcp_free_list_max" (current
value: "-1")
MCA btl: parameter "btl_tcp_free_list_inc" (current
value: "32")
MCA btl: parameter "btl_tcp_sndbuf" (current value:
"131072")
MCA btl: parameter "btl_tcp_rcvbuf" (current value:
"131072")
MCA btl: parameter
"btl_tcp_endpoint_cache" (current
value: "30720")
MCA btl: parameter
"btl_tcp_exclusivity" (current value:
"0")
MCA btl: parameter "btl_tcp_eager_limit" (current
value: "65536")
MCA btl: parameter "btl_tcp_min_send_size" (current
value: "65536")
MCA btl: parameter "btl_tcp_max_send_size" (current
value: "131072")
MCA btl: parameter "btl_tcp_min_rdma_size" (current
value: "131072")
MCA btl: parameter "btl_tcp_max_rdma_size" (current
value: "2147483647")
MCA btl: parameter "btl_tcp_flags" (current
value: "122")
MCA btl: parameter "btl_tcp_priority" (current
value: "0")
MCA btl: parameter "btl_udapl_free_list_num"
(current value: "8")
Initial size of free lists (must be >= 1).
MCA btl: parameter "btl_udapl_free_list_max"
(current value: "-1")
Maximum size of free lists (-1 = infinite,
otherwise must be >=
1).
MCA btl: parameter "btl_udapl_free_list_inc"
(current value: "8")
Increment size of free lists (must be
>= 1).
MCA btl: parameter "btl_udapl_mpool" (current
value:
"udapl")
Name of the memory pool to be used.
MCA btl: parameter "btl_udapl_max_modules" (current
value: "8")
Maximum number of supported HCAs.
MCA btl: parameter
"btl_udapl_num_recvs" (current value:
"8")
Total number of receive buffers to keep
posted per endpoint
(must be >= 1).
MCA btl: parameter
"btl_udapl_num_sends" (current value:
"7")
Maximum number of sends to post on an
endpoint (must be >= 1).
MCA btl: parameter "btl_udapl_sr_win" (current
value: "4")
Window size at which point an explicit
credit message will be
generated (must be >= 1).
MCA btl: parameter "btl_udapl_eager_rdma_num"
(current value: "32")
Number of RDMA buffers to allocate for
small messages (must be
= 1).
MCA btl: parameter "btl_udapl_max_eager_rdma_peers"
(current value:
"16")
Maximum number of peers allowed to use
RDMA
for short messages
(independently RDMA will still be used for
large messages, (must
be >= 0; if zero then RDMA will not be
used for
short
messages).
MCA btl: parameter "btl_udapl_eager_rdma_win"
(current value: "28")
Window size at which point an explicit
credit message will be
generated (must be >= 1).
MCA btl: parameter "btl_udapl_timeout" (current
value: "10000000")
Connection timeout, in microseconds.
MCA btl: parameter "btl_udapl_conn_priv_data"
(current value: "1")
Use connect private data to establish
connections (not supported
by all uDAPL implementations).
MCA btl: parameter
"btl_udapl_async_events" (current
value: "100000000")
The asynchronous event queue will only be
checked after entering
progress this number of times.
MCA btl: parameter "btl_udapl_buffer_alignment"
(current value: "256")
Preferred communication buffer alignment,
in bytes (must be >=
1).
MCA btl: parameter "btl_udapl_async_evd_qlen"
(current value: "256")
The asynchronous event dispatcher queue
length.
MCA btl: parameter "btl_udapl_conn_evd_qlen"
(current value: "256")
The connection event dispatcher queue
length is a function of
the number of connections expected.
MCA btl: parameter
"btl_udapl_dto_evd_qlen" (current
value: "256")
The data transfer operation event
dispatcher queue length is a
function of the number of connections as
well as the maximum
number of outstanding data transfer
operations.
MCA btl: parameter "btl_udapl_max_request_dtos"
(current value: "76")
Maximum number of outstanding submitted
sends and rdma
operations per endpoint, (see Section
6.6.6
of uDAPL Spec.).
MCA btl: parameter "btl_udapl_max_recv_dtos"
(current value: "8")
Maximum number of outstanding submitted
receive operations per
endpoint, (see Section 6.6.6 of uDAPL
Spec.).
MCA btl: parameter "btl_udapl_exclusivity" (current
value: "1014")
uDAPL BTL exclusivity (must be >= 0).
MCA btl: parameter "btl_udapl_eager_limit" (current
value: "8192")
Eager send limit, in bytes (must be >= 1).
MCA btl: parameter "btl_udapl_min_send_size"
(current value: "16384")
Minimum send size, in bytes (must be >=
1).
MCA btl: parameter "btl_udapl_max_send_size"
(current value: "65536")
Maximum send size, in bytes (must be >=
1).
MCA btl: parameter "btl_udapl_min_rdma_size"
(current value: "524288")
Minimum RDMA size, in bytes (must be >=
1).
MCA btl: parameter "btl_udapl_max_rdma_size"
(current value: "131072")
Maximum RDMA size, in bytes (must be >=
1).
MCA btl: parameter "btl_udapl_flags" (current
value: "2")
BTL flags, added together: PUT=2
(cannot be 0).
MCA btl: parameter "btl_udapl_bandwidth" (current
value: "225")
Approximate maximum bandwidth of network
(must be >= 1).
MCA btl: parameter "btl_udapl_priority" (current
value: "0")
MCA btl: parameter "btl_base_include" (current
value:
<none>)
MCA btl: parameter "btl_base_exclude" (current
value:
<none>)
MCA btl: parameter "btl_base_warn_component_unused"
(current value: "0")
This parameter is used to turn on warning
messages when certain
NICs are not used
MCA mtl: parameter "mtl" (current value: <none>)
Default selection set of components for
the
mtl framework
(<none> means "use all components that
can be
found")
MCA mtl: parameter "mtl_base_verbose" (current
value: "0")
Verbosity level for the mtl framework (0 =
no verbosity)
MCA topo: parameter "topo" (current value: <none>)
Default selection set of components for
the
topo framework
(<none> means "use all components that
can be
found")
MCA topo: parameter "topo_base_verbose" (current
value: "0")
Verbosity level for the topo framework
(0 =
no verbosity)
MCA osc: parameter "osc" (current value: <none>)
Default selection set of components for
the
osc framework
(<none> means "use all components that
can be
found")
MCA osc: parameter "osc_base_verbose" (current
value: "0")
Verbosity level for the osc framework (0 =
no verbosity)
MCA osc: parameter "osc_pt2pt_no_locks" (current
value: "0")
Enable optimizations available only if
MPI_LOCK is not used.
MCA osc: parameter "osc_pt2pt_eager_limit" (current
value: "16384")
Max size of eagerly sent data
MCA osc: parameter "osc_pt2pt_priority" (current
value: "0")
MCA errmgr: parameter "errmgr" (current value: <none>)
Default selection set of components for
the
errmgr framework
(<none> means "use all components that
can be
found")
MCA errmgr: parameter "errmgr_hnp_debug" (current
value: "0")
MCA errmgr: parameter
"errmgr_hnp_priority" (current value:
"0")
MCA errmgr: parameter "errmgr_orted_debug" (current
value: "0")
MCA errmgr: parameter "errmgr_orted_priority" (current
value: "0")
MCA errmgr: parameter "errmgr_proxy_debug" (current
value: "0")
MCA errmgr: parameter "errmgr_proxy_priority" (current
value: "0")
MCA gpr: parameter "gpr_base_maxsize" (current
value: "2147483647")
MCA gpr: parameter "gpr_base_blocksize" (current
value:
"512")
MCA gpr: parameter "gpr" (current value: <none>)
Default selection set of components for
the
gpr framework
(<none> means "use all components that
can be
found")
MCA gpr: parameter "gpr_null_priority" (current
value: "0")
MCA gpr: parameter "gpr_proxy_debug" (current
value: "0")
MCA gpr: parameter "gpr_proxy_priority" (current
value: "0")
MCA gpr: parameter "gpr_replica_debug" (current
value: "0")
MCA gpr: parameter
"gpr_replica_isolate" (current value:
"0")
MCA gpr: parameter
"gpr_replica_priority" (current value:
"0")
MCA iof: parameter "iof_base_window_size" (current
value: "4096")
MCA iof: parameter "iof_base_service" (current
value:
"0.0.0")
MCA iof: parameter "iof" (current value: <none>)
Default selection set of components for
the
iof framework
(<none> means "use all components that
can be
found")
MCA iof: parameter "iof_proxy_debug" (current
value: "1")
MCA iof: parameter "iof_proxy_priority" (current
value: "0")
MCA iof: parameter "iof_svc_debug" (current
value: "1")
MCA iof: parameter "iof_svc_priority" (current
value: "0")
MCA ns: parameter "ns" (current value: <none>)
Default selection set of components for
the
ns framework (<none>
means "use all components that can be
found")
MCA ns: parameter "ns_proxy_debug" (current
value: "0")
MCA ns: parameter "ns_proxy_maxsize" (current
value: "2147483647")
MCA ns: parameter "ns_proxy_blocksize" (current
value:
"512")
MCA ns: parameter "ns_proxy_priority" (current
value: "0")
MCA ns: parameter "ns_replica_debug" (current
value: "0")
MCA ns: parameter "ns_replica_isolate" (current
value: "0")
MCA ns: parameter "ns_replica_maxsize" (current
value: "2147483647")
MCA ns: parameter "ns_replica_blocksize" (current
value: "512")
MCA ns: parameter
"ns_replica_priority" (current value:
"0")
MCA oob: parameter "oob" (current value: <none>)
Default selection set of components for
the
oob framework
(<none> means "use all components that
can be
found")
MCA oob: parameter "oob_base_verbose" (current
value: "0")
Verbosity level for the oob framework (0 =
no verbosity)
MCA oob: parameter "oob_tcp_peer_limit" (current
value:
"-1")
MCA oob: parameter "oob_tcp_peer_retries" (current
value: "60")
MCA oob: parameter "oob_tcp_debug" (current
value: "0")
MCA oob: parameter "oob_tcp_include" (current
value: <none>)
MCA oob: parameter "oob_tcp_exclude" (current
value: <none>)
MCA oob: parameter "oob_tcp_sndbuf" (current value:
"131072")
MCA oob: parameter "oob_tcp_rcvbuf" (current value:
"131072")
MCA oob: parameter "oob_tcp_connect_timeout"
(current value: "600")
connect() timeout in seconds, before
trying
next interface
MCA oob: parameter "oob_tcp_connect_sleep" (current
value: "1")
Enable (1) /Disable (0) random sleep for
connection wireup
MCA oob: parameter "oob_tcp_listen_mode" (current
value: "event")
Mode for HNP to accept incoming
connections: event,
listen_thread
MCA oob: parameter
"oob_tcp_listen_thread_max_queue"
(current value:
"10")
High water mark for queued accepted
socket list
size
MCA oob: parameter "oob_tcp_listen_thread_max_time"
(current value:
"10")
Maximum amount of time (in
milliseconds) to
wait between
processing accepted socket list
MCA oob: parameter "oob_tcp_accept_spin_count"
(current value: "10")
Number of times to let accept return
EWOULDBLOCK before updating
accepted socket list
MCA oob: parameter "oob_tcp_priority" (current
value: "0")
MCA ras: parameter "ras" (current value: <none>)
MCA ras: parameter
"ras_dash_host_priority" (current
value: "5")
Selection priority for the dash_host
RAS component
MCA ras: parameter
"ras_gridengine_debug" (current value:
"0")
Enable debugging output for the gridengine
ras component
MCA ras: parameter "ras_gridengine_priority"
(current value: "100")
Priority of the gridengine ras component
MCA ras: parameter
"ras_gridengine_verbose" (current
value: "0")
Enable verbose output for the gridengine
ras component
MCA ras: parameter "ras_gridengine_show_jobid"
(current value: "0")
Show the JOB_ID of the Grid Engine job
MCA ras: parameter
"ras_localhost_priority" (current
value: "0")
Selection priority for the localhost
RAS component
MCA ras: parameter "ras_tm_priority" (current
value: "100")
Priority of the tm ras component
MCA rds: parameter "rds" (current value: <none>)
MCA rds: parameter "rds_hostfile_debug" (current
value: "0")
Toggle debug output for hostfile RDS
component
MCA rds: parameter "rds_hostfile_path" (current
value:
"/opt/SUNWhpc/HPC7.0/etc/openmpi-
default-hostfile")
ORTE Host filename
MCA rds: parameter "rds_hostfile_priority" (current
value: "0")
MCA rds: parameter "rds_proxy_priority" (current
value: "0")
MCA rds: parameter "rds_resfile_debug" (current
value: "0")
Toggle debug output for resfile RDS
component
MCA rds: parameter "rds_resfile_name" (current
value:
<none>)
ORTE Resource filename
MCA rds: parameter
"rds_resfile_priority" (current value:
"0")
MCA rmaps: parameter "rmaps_base_verbose" (current
value: "0")
Verbosity level for the rmaps framework
MCA rmaps: parameter "rmaps_base_schedule_policy"
(current value:
"unspec")
Scheduling Policy for RMAPS. [slot | node]
MCA rmaps: parameter "rmaps_base_pernode" (current
value: "0")
Launch one ppn as directed
MCA rmaps: parameter "rmaps_base_n_pernode" (current
value: "-1")
Launch n procs/node
MCA rmaps: parameter "rmaps_base_schedule_local"
(current value: "1")
If nonzero, allow scheduling MPI
applications on the same node
as mpirun (default). If zero, do not
schedule any MPI
applications on the same node as mpirun
MCA rmaps: parameter "rmaps_base_no_oversubscribe"
(current value: "0")
If nonzero, then do not allow
oversubscription of nodes - mpirun
will return an error if there aren't
enough
nodes to launch all
processes without oversubscribing
MCA rmaps: parameter "rmaps" (current value: <none>)
Default selection set of components for
the
rmaps framework
(<none> means "use all components that
can be
found")
MCA rmaps: parameter "rmaps_round_robin_debug"
(current value: "1")
Toggle debug output for Round Robin
RMAPS component
MCA rmaps: parameter "rmaps_round_robin_priority"
(current value: "1")
Selection priority for Round Robin
RMAPS component
MCA rmgr: parameter "rmgr" (current value: <none>)
Default selection set of components for
the
rmgr framework
(<none> means "use all components that
can be
found")
MCA rmgr: parameter
"rmgr_proxy_priority" (current value:
"0")
MCA rmgr: parameter "rmgr_urm_priority" (current
value: "0")
MCA rml: parameter "rml" (current value: <none>)
Default selection set of components for
the
rml framework
(<none> means "use all components that
can be
found")
MCA rml: parameter "rml_base_verbose" (current
value: "0")
Verbosity level for the rml framework (0 =
no verbosity)
MCA rml: parameter "rml_oob_priority" (current
value: "0")
MCA pls: parameter
"pls_base_reuse_daemons" (current
value: "0")
If nonzero, reuse daemons to launch
dynamically spawned
processes. If zero, do not reuse
daemons (default)
MCA pls: parameter "pls" (current value: <none>)
Default selection set of components for
the
pls framework
(<none> means "use all components that
can be
found")
MCA pls: parameter "pls_base_verbose" (current
value: "0")
Verbosity level for the pls framework (0 =
no verbosity)
MCA pls: parameter
"pls_gridengine_debug" (current value:
"0")
Enable debugging of gridengine pls
component
MCA pls: parameter
"pls_gridengine_verbose" (current
value: "0")
Enable verbose output of the gridengine
qrsh -inherit command
MCA pls: parameter "pls_gridengine_priority"
(current value: "100")
Priority of the gridengine pls component
MCA pls: parameter "pls_gridengine_orted" (current
value: "orted")
The command name that the gridengine pls
component will invoke
for the ORTE daemon
MCA pls: parameter "pls_proxy_priority" (current
value: "0")
MCA pls: parameter "pls_rsh_debug" (current
value: "0")
Whether or not to enable debugging output
for the rsh pls
component (0 or 1)
MCA pls: parameter
"pls_rsh_num_concurrent" (current
value: "128")
How many pls_rsh_agent instances to invoke
concurrently (must be
0)
MCA pls: parameter "pls_rsh_force_rsh" (current
value: "0")
Force the launcher to always use rsh, even
for local daemons
MCA pls: parameter "pls_rsh_orted" (current
value: "orted")
The command name that the rsh pls
component
will invoke for the
ORTE daemon
MCA pls: parameter "pls_rsh_priority" (current
value: "10")
Priority of the rsh pls component
MCA pls: parameter "pls_rsh_delay" (current
value: "1")
Delay (in seconds) between invocations of
the remote agent, but
only used when the "debug" MCA
parameter is
true, or the
top-level MCA debugging is enabled
(otherwise this value is
ignored)
MCA pls: parameter "pls_rsh_reap" (current
value: "1")
If set to 1, wait for all the processes to
complete before
exiting. Otherwise, quit immediately --
without waiting for
confirmation that all other processes
in the job
have
completed.
MCA pls: parameter "pls_rsh_assume_same_shell"
(current value: "1")
If set to 1, assume that the shell on the
remote node is the
same as the shell on the local node.
Otherwise, probe for what
the remote shell.
MCA pls: parameter "pls_rsh_agent" (current value:
"ssh : rsh")
The command used to launch executables on
remote nodes
(typically either "ssh" or "rsh")
MCA pls: parameter "pls_tm_debug" (current
value: "0")
Enable debugging of the TM pls
MCA pls: parameter "pls_tm_verbose" (current
value: "0")
Enable verbose output of the TM pls
MCA pls: parameter "pls_tm_priority" (current
value: "75")
Default selection priority
MCA pls: parameter "pls_tm_orted" (current
value: "orted")
Command to use to start proxy orted
MCA pls: parameter
"pls_tm_want_path_check" (current
value: "1")
Whether the launching process should check
for the pls_tm_orted
executable in the PATH before launching
(the TM API does not
give an idication of failure; this is a
somewhat-lame
workaround; non-zero values enable this
check)
MCA sds: parameter "sds" (current value: <none>)
Default selection set of components for
the
sds framework
(<none> means "use all components that
can be
found")
MCA sds: parameter "sds_base_verbose" (current
value: "0")
Verbosity level for the sds framework (0 =
no verbosity)
MCA sds: parameter "sds_env_priority" (current
value: "0")
MCA sds: parameter "sds_pipe_priority" (current
value: "0")
MCA sds: parameter "sds_seed_priority" (current
value: "0")
MCA sds: parameter
"sds_singleton_priority" (current
value: "0")
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users