Hi,

Graham E Fagg wrote:
>  I am not sure which alltoall your using in 1.1 so can you please run
> the ompi_info utility which is normally built and put into the same
> directory as mpirun?
>
> i.e. host% ompi_info
>
> This provides lots of really usefull info on everything before we dig
> deeper into your issue
>
>
> and then more specifically run
> host%  ompi_info --param coll all

Find attached ~/notes from

 $ ( ompi_info; echo '====================='; ompi_info --param coll all ) 
>~/notes

Thanks in advance and kind regards,
-- 
Frank Gruellich
HPC-Techniker

Tel.:   +49 3722 528 42
Fax:    +49 3722 528 15
E-Mail: frank.gruell...@megware.com

MEGWARE Computer GmbH
Vertrieb und Service
Nordstrasse 19
09247 Chemnitz/Roehrsdorf
Germany
http://www.megware.com/
                Open MPI: 1.1b1
   Open MPI SVN revision: r10217
                Open RTE: 1.1b1
   Open RTE SVN revision: r10217
                    OPAL: 1.1b1
       OPAL SVN revision: r10217
                  Prefix: /usr/ofed/mpi/intel/openmpi-1.1b1-1
 Configured architecture: x86_64-suse-linux-gnu
           Configured by: root
           Configured on: Wed Jul 19 20:51:46 CEST 2006
          Configure host: frontend
                Built by: root
                Built on: Wed Jul 19 21:04:47 CEST 2006
              Built host: frontend
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: icc
     C compiler absolute: /software/intel/cce/9.1.038/bin/icc
            C++ compiler: icpc
   C++ compiler absolute: /software/intel/cce/9.1.038/bin/icpc
      Fortran77 compiler: ifort
  Fortran77 compiler abs: /software/intel/fce/9.1.032/bin/ifort
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1)
           MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1)
               MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1)
                 MCA pml: dr (MCA v1.0, API v1.0, Component v1.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: openib (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1)
                 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1)
                 MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1)
                 MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1)
=====================
                MCA coll: parameter "coll" (current value: <none>)
                          Default selection set of components for the coll 
framework (<none> means "use all components that can be found")
                MCA coll: parameter "coll_base_verbose" (current value: "0")
                          Verbosity level for the coll framework (0 = no 
verbosity)
                MCA coll: parameter "coll_basic_priority" (current value: "10")
                          Priority of the basic coll component
                MCA coll: parameter "coll_basic_crossover" (current value: "4")
                          Minimum number of processes in a communicator before 
using the logarithmic algorithms
                MCA coll: parameter "coll_hierarch_priority" (current value: 
"0")
                          Priority of the hierarchical coll component
                MCA coll: parameter "coll_hierarch_verbose" (current value: "0")
                          Turn verbose message of the hierarchical coll 
component on/off
                MCA coll: parameter "coll_hierarch_use_rdma" (current value: 
"0")
                          Switch from the send btl list used to detect 
hierarchies to the rdma btl list
                MCA coll: parameter "coll_hierarch_ignore_sm" (current value: 
"0")
                          Ignore sm protocol when detecting hierarchies. 
Required to enable the usage of protocol specific collective operations
                MCA coll: parameter "coll_hierarch_symmetric" (current value: 
"0")
                          Assume symmetric configuration
                MCA coll: parameter "coll_self_priority" (current value: "75")
                MCA coll: parameter "coll_sm_priority" (current value: "0")
                          Priority of the sm coll component
                MCA coll: parameter "coll_sm_control_size" (current value: 
"4096")
                          Length of the control data -- should usually be 
either the length of a cache line on most SMPs, or the size of a page on 
machines that support direct memory affinity page placement (in bytes)
                MCA coll: parameter "coll_sm_bootstrap_filename" (current 
value: "coll-sm-bootstrap")
                          Filename (in the Open MPI session directory) of the 
coll sm component bootstrap rendezvous mmap file
                MCA coll: parameter "coll_sm_bootstrap_num_segments" (current 
value: "8")
                          Number of segments in the bootstrap file
                MCA coll: parameter "coll_sm_fragment_size" (current value: 
"8192")
                          Fragment size (in bytes) used for passing data 
through shared memory (will be rounded up to the nearest control_size size)
                MCA coll: parameter "coll_sm_mpool" (current value: "sm")
                          Name of the mpool component to use
                MCA coll: parameter "coll_sm_comm_in_use_flags" (current value: 
"2")
                          Number of "in use" flags, used to mark a message 
passing area segment as currently being used or not (must be >= 2 and <= 
comm_num_segments)
                MCA coll: parameter "coll_sm_comm_num_segments" (current value: 
"8")
                          Number of segments in each communicator's shared 
memory message passing area (must be >= 2, and must be a multiple of 
comm_in_use_flags)
                MCA coll: parameter "coll_sm_tree_degree" (current value: "4")
                          Degree of the tree for tree-based operations (must be 
=> 1 and <= min(control_size, 255))
                MCA coll: information "coll_sm_shared_mem_used_bootstrap" 
(value: "216")
                          Amount of shared memory used in the shared memory 
bootstrap area (in bytes)
                MCA coll: parameter "coll_sm_info_num_procs" (current value: 
"4")
                          Number of processes to use for the calculation of the 
shared_mem_size MCA information parameter (must be => 2)
                MCA coll: information "coll_sm_shared_mem_used_data" (value: 
"548864")
                          Amount of shared memory used in the shared memory 
data area for info_num_procs processes (in bytes)
                MCA coll: parameter "coll_tuned_priority" (current value: "30")
                          Priority of the tuned coll component
                MCA coll: parameter 
"coll_tuned_pre_allocate_memory_comm_size_limit" (current value: "32768")
                          Size of communicator were we stop pre-allocating 
memory for the fixed internal buffer used for message requests etc that is hung 
off the communicator data segment. I.e. if you have a 100'000 nodes you might 
not want to pre-allocate 200'000 request handle slots per communicator instance!
                MCA coll: parameter "coll_tuned_use_dynamic_rules" (current 
value: "0")
                          Switch used to decide if we use static (if 
statements) or dynamic (built at runtime) decision function rules
                MCA coll: parameter "coll_tuned_init_tree_fanout" (current 
value: "4")
                          Inital fanout used in the tree topologies for each 
communicator. This is only an initial guess, if a tuned collective needs a 
different fanout for an operation, it build it dynamically. This parameter is 
only for the first guess and might save a little time
                MCA coll: parameter "coll_tuned_init_chain_fanout" (current 
value: "4")
                          Inital fanout used in the chain (fanout followed by 
pipeline) topologies for each communicator. This is only an initial guess, if a 
tuned collective needs a different fanout for an operation, it build it 
dynamically. This parameter is only for the first guess and might save a little 
time
                MCA coll: parameter "coll_tuned_allreduce_algorithm" (current 
value: "0")
                          Which allreduce algorithm is used. Can be locked down 
to choice of: 0 ignore, 1 basic linear, 2 nonoverlapping (tuned reduce + tuned 
bcast)
                MCA coll: parameter 
"coll_tuned_allreduce_algorithm_segmentsize" (current value: "0")
                          Segment size in bytes used by default for allreduce 
algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 
bytes means no segmentation.
                MCA coll: parameter 
"coll_tuned_allreduce_algorithm_tree_fanout" (current value: "4")
                          Fanout for n-tree used for allreduce algorithms. Only 
has meaning if algorithm is forced and supports n-tree topo based operation.
                MCA coll: parameter 
"coll_tuned_allreduce_algorithm_chain_fanout" (current value: "4")
                          Fanout for chains used for allreduce algorithms. Only 
has meaning if algorithm is forced and supports chain topo based operation.
                MCA coll: parameter "coll_tuned_alltoall_algorithm" (current 
value: "0")
                          Which alltoall algorithm is used. Can be locked down 
to choice of: 0 ignore, 1 basic linear, 2 pairwise, 3: modified bruck, 4: two 
proc only.
                MCA coll: parameter "coll_tuned_alltoall_algorithm_segmentsize" 
(current value: "0")
                          Segment size in bytes used by default for alltoall 
algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 
bytes means no segmentation.
                MCA coll: parameter "coll_tuned_alltoall_algorithm_tree_fanout" 
(current value: "4")
                          Fanout for n-tree used for alltoall algorithms. Only 
has meaning if algorithm is forced and supports n-tree topo based operation.
                MCA coll: parameter 
"coll_tuned_alltoall_algorithm_chain_fanout" (current value: "4")
                          Fanout for chains used for alltoall algorithms. Only 
has meaning if algorithm is forced and supports chain topo based operation.
                MCA coll: parameter "coll_tuned_barrier_algorithm" (current 
value: "0")
                          Which barrier algorithm is used. Can be locked down 
to choice of: 0 ignore, 1 linear, 2 double ring, 3: recursive doubling 4: 
bruck, 5: two proc only, 6: step based bmtree
                MCA coll: parameter "coll_tuned_bcast_algorithm" (current 
value: "0")
                          Which bcast algorithm is used. Can be locked down to 
choice of: 0 ignore, 1 basic linear, 2 chain, 3: pipeline, 4: split binary 
tree, 5: binary tree, 6: BM tree.
                MCA coll: parameter "coll_tuned_bcast_algorithm_segmentsize" 
(current value: "0")
                          Segment size in bytes used by default for bcast 
algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 
bytes means no segmentation.
                MCA coll: parameter "coll_tuned_bcast_algorithm_tree_fanout" 
(current value: "4")
                          Fanout for n-tree used for bcast algorithms. Only has 
meaning if algorithm is forced and supports n-tree topo based operation.
                MCA coll: parameter "coll_tuned_bcast_algorithm_chain_fanout" 
(current value: "4")
                          Fanout for chains used for bcast algorithms. Only has 
meaning if algorithm is forced and supports chain topo based operation.
                MCA coll: parameter "coll_tuned_reduce_algorithm" (current 
value: "0")
                          Which reduce algorithm is used. Can be locked down to 
choice of: 0 ignore, 1 linear, 2 chain, 3 pipeline
                MCA coll: parameter "coll_tuned_reduce_algorithm_segmentsize" 
(current value: "0")
                          Segment size in bytes used by default for reduce 
algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 
bytes means no segmentation.
                MCA coll: parameter "coll_tuned_reduce_algorithm_tree_fanout" 
(current value: "4")
                          Fanout for n-tree used for reduce algorithms. Only 
has meaning if algorithm is forced and supports n-tree topo based operation.
                MCA coll: parameter "coll_tuned_reduce_algorithm_chain_fanout" 
(current value: "4")
                          Fanout for chains used for reduce algorithms. Only 
has meaning if algorithm is forced and supports chain topo based operation.

Reply via email to