On Apr 27, 2009, at 10:22 PM, jan wrote:

Thank You Jeff Squyres.

I have checked out the web page
http://www.open-mpi.org/community/lists/announce/2009/03/0029.php, then the
page https://svn.open-mpi.org/trac/ompi/ticket/1853 , but the web page
svn.open-mpi.org seems crash.


Try that ticket again; sometimes Trac does weird things. :-( A reload of the page usually fixes the problem.

Then I tried OpenMpi V1.3.2 for many different configuration again. but found the problem still occurred periodic, ie. twice success, then twice
failed, twice
success, then twice failed ... . Do you have any suggestion for this issue?


Can you send us a small example that reproduces the problem?


Thank you again.

Best Regards,

Gloria Jan
Wavelink Technology Inc.


>
> Per http://www.open-mpi.org/community/lists/announce/2009/03/0029.php ,
> can you try upgrading to Open MPI v1.3.2?
>
>
> On Apr 24, 2009, at 5:21 AM, jan wrote:
>
>> Dear Sir,
>>
>> I?m running a cluster with OpenMPI.
>>
>> $mpirun --mca mpi_show_mpi_alloc_mem_leaks 8 --mca
>> mpi_show_handle_leaks 1 $HOME/test/cpi
>>
>> I got the error message as job failed:
>>
>> Process 15 on node2
>> Process 6 on node1
>> Process 14 on node2
>> ? ? ?
>> Process 0 on node1
>> Process 10 on node2
>> [node2][[9340,1],13][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],9][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>>  with -2 errno says Success
>> [node2][[9340,1],10][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],11][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> [node2][[9340,1],8][btl_openib_component.c:3002:poll_device] error
>> polling HP CQ
>>  with -2 errno says Success
>> [node2][[9340,1],15][btl_openib_component.c:3002:poll_device] [node2]
>> [[9340,1],1
>> 2][btl_openib_component.c:3002:poll_device] error polling HP CQ with
>> -2 errno sa
>> ys Success
>> error polling HP CQ with -2 errno says Success
>> [node2][[9340,1],14][btl_openib_component.c:3002:poll_device] error
>> polling HP C
>> Q with -2 errno says Success
>> mpirun: killing job...
>>
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 28438 on node node1
>> exited on signal
>>  0 (Unknown signal 0).
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> and got the message as job success
>>
>> Process 1 on node1
>> Process 2 on node1
>> ? ? ?
>> Process 13 on node2
>> Process 14 on node2
>> --------------------------------------------------------------------------
>> The following memory locations were allocated via MPI_ALLOC_MEM but
>> not freed via MPI_FREE_MEM before invoking MPI_FINALIZE:
>>
>> Process ID: [[13692,1],12]
>> Hostname:   node2
>> PID:        30183
>>
>> (null)
>> --------------------------------------------------------------------------
>> [node1:32276] 15 more processes have sent help message help-mpool-
>> base.txt / all
>>  mem leaks
>> [node1:32276] Set MCA parameter "orte_base_help_aggregate" to 0 to
>> see all help
>> / error messages
>>
>>
>> It  occurred periodic, ie. twice success, then twice failed, twice
>> success, then twice failed ? . I download the OFED-1.4.1-rc3 from
>> The OpenFabrics Alliance and installed on Dell PowerEdge M600 Blade
>> Server. The infiniband Mezzanine Cards is Mellanox ConnectX QDR &
>> DDR. And infiniband switch module is Mellanox M2401G. OS is CentOS
>> 5.3, kernel  2.6.18-128.1.6.el5, with PGI V7.2-5 compiler. It?s
>> running OpenSM subnet manager.
>>
>> Best Regards,
>>
>> Gloria Jan
>>
>> Wavelink Technology Inc.
>>
>> The output of the "ompi_info --all" command as:
>>
>>                  Package: Open MPI root@vortex Distribution
>>                 Open MPI: 1.3.1
>>    Open MPI SVN revision: r20826
>>    Open MPI release date: Mar 18, 2009
>>                 Open RTE: 1.3.1
>>    Open RTE SVN revision: r20826
>>    Open RTE release date: Mar 18, 2009
>>                     OPAL: 1.3.1
>>        OPAL SVN revision: r20826
>>        OPAL release date: Mar 18, 2009
>>             Ident string: 1.3.1
>>            MCA backtrace: execinfo (MCA v2.0, API v2.0, Component
>> v1.3.1)
>>               MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3.1) >> MCA carto: auto_detect (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA carto: file (MCA v2.0, API v2.0, Component v1.3.1)
>>            MCA maffinity: first_use (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA timer: linux (MCA v2.0, API v2.0, Component v1.3.1) >> MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3.1)
>>          MCA installdirs: config (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3.1) >> MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3.1) >> MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3.1)
>>            MCA allocator: bucket (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA coll: basic (MCA v2.0, API v2.0, Component v1.3.1)
>>                 MCA coll: hierarch (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA coll: inter (MCA v2.0, API v2.0, Component v1.3.1) >> MCA coll: self (MCA v2.0, API v2.0, Component v1.3.1)
>>                 MCA coll: sm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA coll: sync (MCA v2.0, API v2.0, Component v1.3.1) >> MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3.1) >> MCA io: romio (MCA v2.0, API v2.0, Component v1.3.1) >> MCA mpool: fake (MCA v2.0, API v2.0, Component v1.3.1) >> MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3.1)
>>                MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA pml: cm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA pml: v (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3.1) >> MCA btl: ofud (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA btl: openib (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA btl: self (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA btl: sm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3.1) >> MCA topo: unity (MCA v2.0, API v2.0, Component v1.3.1) >> MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3.1) >> MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3.1) >> MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3.1) >> MCA iof: orted (MCA v2.0, API v2.0, Component v1.3.1) >> MCA iof: tool (MCA v2.0, API v2.0, Component v1.3.1) >> MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3.1)
>>                 MCA odls: default (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA ras: tm (MCA v2.0, API v2.0, Component v1.3.1)
>>                MCA rmaps: rank_file (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA rmaps: round_robin (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3.1) >> MCA rml: oob (MCA v2.0, API v2.0, Component v1.3.1)
>>               MCA routed: binomial (MCA v2.0, API v2.0, Component
>> v1.3.1)
>>               MCA routed: direct (MCA v2.0, API v2.0, Component
>> v1.3.1)
>>               MCA routed: linear (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3.1) >> MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA plm: tm (MCA v2.0, API v2.0, Component v1.3.1)
>> MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3.1)
>>               MCA errmgr: default (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ess: env (MCA v2.0, API v2.0, Component v1.3.1) >> MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3.1)
>>                  MCA ess: singleton (MCA v2.0, API v2.0, Component
>> v1.3.1)
>> MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3.1) >> MCA ess: tool (MCA v2.0, API v2.0, Component v1.3.1) >> MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3.1) >> MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3.1)
>>                   Prefix: /usr/mpi/pgi/openmpi-1.3.1
>>              Exec_prefix: /usr/mpi/pgi/openmpi-1.3.1
>>                   Bindir: /usr/mpi/pgi/openmpi-1.3.1/bin
>>                  Sbindir: /usr/mpi/pgi/openmpi-1.3.1/sbin
>>                   Libdir: /usr/mpi/pgi/openmpi-1.3.1/lib64
>>                   Incdir: /usr/mpi/pgi/openmpi-1.3.1/include
>>                   Mandir: /usr/mpi/pgi/openmpi-1.3.1/share/man
>>                Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
>>               Libexecdir: /usr/mpi/pgi/openmpi-1.3.1/libexec
>>              Datarootdir: /usr/mpi/pgi/openmpi-1.3.1/share
>>                  Datadir: /usr/mpi/pgi/openmpi-1.3.1/share
>>               Sysconfdir: /usr/mpi/pgi/openmpi-1.3.1/etc
>>           Sharedstatedir: /usr/mpi/pgi/openmpi-1.3.1/com
>>            Localstatedir: /var                 Infodir: /usr/share/
>> info
>>               Pkgdatadir: /usr/mpi/pgi/openmpi-1.3.1/share/openmpi
>>                Pkglibdir: /usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi
>> Pkgincludedir: /usr/mpi/pgi/openmpi-1.3.1/include/ openmpi
>>  Configured architecture: x86_64-redhat-linux-gnu
>>           Configure host: vortex
>>            Configured by: root
>>            Configured on: Sun Apr 12 23:23:14 CST 2009
>>           Configure host: vortex
>>                 Built by: root
>>                 Built on: Sun Apr 12 23:28:52 CST 2009
>>               Built host: vortex
>>               C bindings: yes
>>             C++ bindings: yes
>>       Fortran77 bindings: yes (all)
>>       Fortran90 bindings: yes
>>  Fortran90 bindings size: small
>>               C compiler: pgcc
>>      C compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgcc
>>              C char size: 1
>>              C bool size: 1
>>             C short size: 2
>>               C int size: 4
>>              C long size: 8
>>             C float size: 4
>>            C double size: 8
>>           C pointer size: 8
>>             C char align: 1
>>             C bool align: 1
>>              C int align: 4
>>            C float align: 4
>>           C double align: 8
>>             C++ compiler: pgCC
>>    C++ compiler absolute: /opt/pgi/linux86-64/7.2-5/bin/pgCC
>>       Fortran77 compiler: pgf77
>>   Fortran77 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf77
>>       Fortran90 compiler: pgf90
>>   Fortran90 compiler abs: /opt/pgi/linux86-64/7.2-5/bin/pgf90
>>        Fort integer size: 4
>>        Fort logical size: 4
>>  Fort logical value true: -1
>>       Fort have integer1: yes
>>       Fort have integer2: yes
>>       Fort have integer4: yes
>>       Fort have integer8: yes
>>      Fort have integer16: no
>>          Fort have real4: yes
>>          Fort have real8: yes
>>         Fort have real16: no
>>       Fort have complex8: yes
>>      Fort have complex16: yes
>>      Fort have complex32: no
>>       Fort integer1 size: 1
>>       Fort integer2 size: 2
>>       Fort integer4 size: 4
>>       Fort integer8 size: 8
>>      Fort integer16 size: -1
>>           Fort real size: 4
>>          Fort real4 size: 4
>>          Fort real8 size: 8
>>         Fort real16 size: -1
>>       Fort dbl prec size: 4
>>           Fort cplx size: 4
>>       Fort dbl cplx size: 4
>>          Fort cplx8 size: 8
>>         Fort cplx16 size: 16
>>         Fort cplx32 size: -1
>>       Fort integer align: 4
>>      Fort integer1 align: 1
>>      Fort integer2 align: 2
>>      Fort integer4 align: 4
>>      Fort integer8 align: 8
>>     Fort integer16 align: -1
>>          Fort real align: 4
>>         Fort real4 align: 4
>>         Fort real8 align: 8
>>        Fort real16 align: -1
>>      Fort dbl prec align: 4
>>          Fort cplx align: 4
>>      Fort dbl cplx align: 4
>>         Fort cplx8 align: 4
>>        Fort cplx16 align: 8
>>        Fort cplx32 align: -1
>>              C profiling: yes
>>            C++ profiling: yes         Thread support: posix (mpi:
>> no, progress: no)
>>            Sparse Groups: no
>>             Build CFLAGS: -O -DNDEBUG
>>           Build CXXFLAGS: -O -DNDEBUG
>>             Build FFLAGS:
>>            Build FCFLAGS: -O2
>>            Build LDFLAGS: -export-dynamic
>>               Build LIBS: -lnsl -lutil  -lpthread
>>     Wrapper extra CFLAGS:
>>   Wrapper extra CXXFLAGS:   -fpic
>>     Wrapper extra FFLAGS:
>>    Wrapper extra FCFLAGS:
>>    Wrapper extra LDFLAGS:
>> Wrapper extra LIBS: -ldl -Wl,--export-dynamic -lnsl - lutil
>> -lpthread -ldl
>>   Internal debug support: no
>>      MPI parameter check: runtime
>> Memory profiling support: no
>> Memory debugging support: no
>>          libltdl support: yes
>>    Heterogeneous support: no
>>  mpirun default --prefix: yes
>>          MPI I/O support: yes
>>        MPI_WTIME support: gettimeofday
>> Symbol visibility support: no
>>    FT Checkpoint support: no  (checkpoint thread: no)
>>                  MCA mca: parameter "mca_param_files" (current
>> value: "/home/alpha/.openmpi/mca-params.conf:/usr/mpi/pgi/open
>> mpi-1.3.1/etc/openmpi-mca-params.conf", data source: default value)
>>                           Path for MCA configuration files
>> containing default parameter values
>>                  MCA mca: parameter
>> "mca_base_param_file_prefix" (current value: <none>, data source:
>> default value)
>>                           Aggregate MCA parameter file sets
>>                  MCA mca: parameter
>> "mca_base_param_file_path" (current value: "/usr/mpi/pgi/
>> openmpi-1.3.1/share/openmpi/amca
>> -param-sets:/home/alpha", data source: default value)
>>                           Aggregate MCA parameter Search path
>>                  MCA mca: parameter
>> "mca_base_param_file_path_force" (current value: <none>, data
>> source: default value)
>> Forced Aggregate MCA parameter Search path
>>                  MCA mca: parameter "mca_component_path" (current
>> value: "/usr/mpi/pgi/openmpi-1.3.1/lib64/openmpi:/home/alph
>> a/.openmpi/components", data source: default value)
>>                           Path where to look for Open MPI and ORTE
>> components
>>                  MCA mca: parameter "mca_verbose" (current value:
>> <none>, data source: default value)
>>                           Top-level verbosity parameter
>>                  MCA mca: parameter
>> "mca_component_show_load_errors" (current value: "1", data source:
>> default value)
>> Whether to show errors for components that
>> failed to load or not
>>                  MCA mca: parameter
>> "mca_component_disable_dlopen" (current value: "0", data source:
>> default value)
>>                           Whether to attempt to disable opening
>> dynamic components or not
>>                  MCA mpi: parameter "mpi_param_check" (current
>> value: "1", data source: default value)
>>                           Whether you want MPI API parameters
>> checked at run-time or not.  Possible values are 0 (no checking
>> ) and 1 (perform checking at run-time)
>>                  MCA mpi: parameter "mpi_yield_when_idle" (current
>> value: "-1", data source: default value)
>>                           Yield the processor when waiting for MPI
>> communication (for MPI processes, will default to 1 when o
>> versubscribing nodes)
>>                  MCA mpi: parameter "mpi_event_tick_rate" (current
>> value: "-1", data source: default value)
>>                           How often to progress TCP communications
>> (0 = never, otherwise specified in microseconds)
>> MCA mpi: parameter "mpi_show_handle_leaks" (current
>> value: "1", data source: environment)
>> Whether MPI_FINALIZE shows all MPI handles
>> that were not freed or not
>>                  MCA mpi: parameter "mpi_no_free_handles" (current
>> value: "0", data source: environment)
>>                           Whether to actually free MPI objects when
>> their handles are freed
>>                  MCA mpi: parameter
>> "mpi_show_mpi_alloc_mem_leaks" (current value: "8", data source:
>> environment)
>>                           If >0, MPI_FINALIZE will show up to this
>> many instances of memory allocated by MPI_ALLOC_MEM that w
>> as not freed by MPI_FREE_MEM
>>                  MCA mpi: parameter "mpi_show_mca_params" (current
>> value: <none>, data source: default value)
>>                           Whether to show all MCA parameter values
>> during MPI_INIT or not (good for reproducability of MPI jo
>> bs for debug purposes). Accepted values are all, default, file, api,
>> and enviro - or a comma delimited combination of them
>>                  MCA mpi: parameter
>> "mpi_show_mca_params_file" (current value: <none>, data source:
>> default value)
>>                           If mpi_show_mca_params is true, setting
>> this string to a valid filename tells Open MPI to dump all
>> the MCA parameter values into a file suitable for reading via the
>> mca_param_files parameter (good for reproducability of MPI
>> jobs)
>>                  MCA mpi: parameter
>> "mpi_keep_peer_hostnames" (current value: "1", data source: default
>> value)
>>                           If nonzero, save the string hostnames of
>> all MPI peer processes (mostly for error / debugging outpu
>> t messages).  This can add quite a bit of memory usage to each MPI
>> process.
>>                  MCA mpi: parameter "mpi_abort_delay" (current
>> value: "0", data source: default value)
>>                           If nonzero, print out an identifying
>> message when MPI_ABORT is invoked (hostname, PID of the proces
>> s that called MPI_ABORT) and delay for that many seconds before
>> exiting (a negative delay value means to never abort).  This
>> allows attaching of a debugger before quitting the job.
>> MCA mpi: parameter "mpi_abort_print_stack" (current
>> value: "0", data source: default value)
>>                           If nonzero, print out a stack trace when
>> MPI_ABORT is invoked
>>                  MCA mpi: parameter "mpi_preconnect_mpi" (current
>> value: "0", data source: default value, synonyms: mpi_preco
>> nnect_all)
>>                           Whether to force MPI processes to fully
>> wire-up the MPI connections between MPI processes during MP
>> I_INIT (vs. making connections lazily -- upon the first MPI traffic
>> between each process peer pair)
>>                  MCA mpi: parameter "mpi_preconnect_all" (current
>> value: "0", data source: default value, deprecated, synonym
>>  of: mpi_preconnect_mpi)
>>                           Whether to force MPI processes to fully
>> wire-up the MPI connections between MPI processes during MP
>> I_INIT (vs. making connections lazily -- upon the first MPI traffic
>> between each process peer pair)
>>                  MCA mpi: parameter "mpi_leave_pinned" (current
>> value: "0", data source: environment)
>> Whether to use the "leave pinned" protocol
>> or not.  Enabling this setting can help bandwidth perfor
>> mance when repeatedly sending and receiving large messages with the
>> same buffers over RDMA-based networks (0 = do not use "le
>> ave pinned" protocol, 1 = use "leave pinned" protocol, -1 = allow
>> network to choose at runtime).
>>                  MCA mpi: parameter
>> "mpi_leave_pinned_pipeline" (current value: "0", data source:
>> default value)
>> Whether to use the "leave pinned pipeline"
>> protocol or not.
>>                  MCA mpi: parameter "mpi_paffinity_alone" (current
>> value: "0", data source: default value)
>>                           If nonzero, assume that this job is the
>> only (set of) process(es) running on each node and bind pro
>> cesses to processors, starting with processor ID 0
>>                  MCA mpi: parameter "mpi_warn_on_fork" (current
>> value: "1", data source: default value)
>>                           If nonzero, issue a warning if program
>> forks under conditions that could cause system errors
>>                  MCA mpi: information
>> "mpi_have_sparse_group_storage" (value: "0", data source: default
>> value)
>>                           Whether this Open MPI installation
>> supports storing of data in MPI groups in "sparse" formats (good
>>  for extremely large process count MPI jobs that create many
>> communicators/groups)
>>                  MCA mpi: parameter
>> "mpi_use_sparse_group_storage" (current value: "0", data source:
>> default value)
>>                           Whether to use "sparse" storage formats
>> for MPI groups (only relevant if mpi_have_sparse_group_storage is 1)
>>                 MCA orte: parameter
>> "orte_base_help_aggregate" (current value: "1", data source: default
>> value)
>>                           If orte_base_help_aggregate is true,
>> duplicate help messages will be aggregated rather than display
>> ed individually.  This can be helpful for parallel jobs that
>> experience multiple identical failures; rather than print out th
>> e same help/failure message N times, display it once with a count of
>> how many processes sent the same message.
>>                 MCA orte: parameter "orte_tmpdir_base" (current
>> value: <none>, data source: default value)
>>                           Base of the session directory tree
>>                 MCA orte: parameter "orte_no_session_dirs" (current
>> value: <none>, data source: default value)
>>                           Prohibited locations for session
>> directories (multiple locations separated by ',', default=NULL)
>>                 MCA orte: parameter "orte_debug" (current value:
>> "0", data source: default value)
>>                           Top-level ORTE debug switch (default
>> verbosity: 1)
>>                 MCA orte: parameter "orte_debug_verbose" (current
>> value: "-1", data source: default value)
>>                           Verbosity level for ORTE debug messages
>> (default: 1)
>>                 MCA orte: parameter "orte_debug_daemons" (current
>> value: "0", data source: default value)
>>                           Whether to debug the ORTE daemons or not
>>                 MCA orte: parameter
>> "orte_debug_daemons_file" (current value: "0", data source: default
>> value)
>>                           Whether want stdout/stderr of daemons to
>> go to a file or not
>>                 MCA orte: parameter
>> "orte_leave_session_attached" (current value: "0", data source:
>> default value)
>> Whether applications and/or daemons should
>> leave their sessions attached so that any output can be
>> received - this allows X forwarding without all the attendant
>> debugging output
>>                 MCA orte: parameter "orte_do_not_launch" (current
>> value: "0", data source: default value)
>>                           Perform all necessary operations to
>> prepare to launch the application, but do not actually launch it
>>                 MCA orte: parameter "orte_daemon_spin" (current
>> value: "0", data source: default value)
>>                           Have any orteds spin until we can connect
>> a debugger to them
>>                 MCA orte: parameter "orte_daemon_fail" (current
>> value: "-1", data source: default value)
>>                           Have the specified orted fail after init
>> for debugging purposes
>>                 MCA orte: parameter
>> "orte_daemon_fail_delay" (current value: "0", data source: default
>> value)
>>                           Have the specified orted fail after
>> specified number of seconds (default: 0 => no delay)
>>                 MCA orte: parameter "orte_heartbeat_rate" (current
>> value: "0", data source: default value)
>> Seconds between checks for daemon state- of-
>> health (default: 0 => do not check)
>>                 MCA orte: parameter "orte_startup_timeout" (current
>> value: "0", data source: default value)
>>                           Milliseconds/daemon to wait for startup
>> before declaring failed_to_start (default: 0 => do not chec
>>      Fortran77 profiling: yes
>>      Fortran90 profiling: yes
>>           C++ exceptions: nok)
>>                 MCA orte: parameter "orte_timing" (current value:
>> "0", data source: default value)
>>                           Request that critical timing loops be
>> measured
>>                 MCA orte: parameter
>> "orte_base_user_debugger" (current value: "totalview @mpirun@ -a
>> @mpirun_args@ : ddt -n @
>> np@ -start @executable@ @executable_argv@ @single_app@ : fxp
>> @mpirun@ -a @mpirun_args@", data source: default value)
>> Sequence of user-level debuggers to search
>> for in orterun
>>                 MCA orte: parameter "orte_abort_timeout" (current
>> value: "1", data source: default value)
>> Max time to wait [in secs] before aborting
>> an ORTE operation (default: 1sec)
>>                 MCA orte: parameter "orte_timeout_step" (current
>> value: "1000", data source: default value)
>>                           Time to wait [in usecs/proc] before
>> aborting an ORTE operation (default: 1000 usec/proc)
>> MCA orte: parameter "orte_default_hostfile" (current
>> value: <none>, data source: default value)
>>                           Name of the default hostfile (relative or
>> absolute path)
>>                 MCA orte: parameter
>> "orte_keep_fqdn_hostnames" (current value: "0", data source: default
>> value)
>>                           Whether or not to keep FQDN hostnames
>> [default: no]
>> MCA orte: parameter "orte_contiguous_nodes" (current
>> value: "2147483647", data source: default value)
>>                           Number of nodes after which contiguous
>> nodename encoding will automatically be used [default: INT_MAX]
>>                 MCA orte: parameter "orte_tag_output" (current
>> value: "0", data source: default value)
>>                           Tag all output with [job,rank] (default:
>> false)
>>                 MCA orte: parameter "orte_xml_output" (current
>> value: "0", data source: default value)
>> Display all output in XML format (default:
>> false)
>> MCA orte: parameter "orte_timestamp_output" (current
>> value: "0", data source: default value)
>>                           Timestamp all application process output
>> (default: false)
>>                 MCA orte: parameter "orte_output_filename" (current
>> value: <none>, data source: default value)
>> Redirect output from application processes
>> into filename.rank [default: NULL]
>>                 MCA orte: parameter
>> "orte_show_resolved_nodenames" (current value: "0", data source:
>> default value)
>>                           Display any node names that are resolved
>> to a different name (default: false)
>>                 MCA orte: parameter "orte_hetero_apps" (current
>> value: "0", data source: default value)
>>                           Indicates that multiple app_contexts are
>> being provided that are a mix of 32/64 bit binaries (default: false)
>>                 MCA orte: parameter "orte_launch_agent" (current
>> value: "orted", data source: default value)
>>                           Command used to start processes on remote
>> nodes (default: orted)
>>                 MCA orte: parameter
>> "orte_allocation_required" (current value: "0", data source: default
>> value)
>> Whether or not an allocation by a resource
>> manager is required [default: no]
>>                 MCA orte: parameter "orte_xterm" (current value:
>> <none>, data source: default value)
>>                           Create a new xterm window and display
>> output from the specified ranks there [default: none]
>>                 MCA orte: parameter
>> "orte_forward_job_control" (current value: "0", data source: default
>> value)
>>                           Forward SIGTSTP (after converting to
>> SIGSTOP) and SIGCONT signals to the application procs [default: no]
>>                 MCA opal: parameter "opal_signal" (current value:
>> "6,7,8,11", data source: default value)
>> If a signal is received, display the stack
>> trace frame
>>                 MCA opal: parameter
>> "opal_set_max_sys_limits" (current value: "0", data source: default
>> value)
>>                           Set to non-zero to automatically set any
>> system-imposed limits to the maximum allowed
>>                 MCA opal: parameter "opal_event_include" (current
>> value: "poll", data source: default value)
>> ... ... ...
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 1212, Issue 3
> **************************************
>



--
Jeff Squyres
Cisco Systems

Reply via email to