Hi, Graham E Fagg wrote: > I am not sure which alltoall your using in 1.1 so can you please run > the ompi_info utility which is normally built and put into the same > directory as mpirun? > > i.e. host% ompi_info > > This provides lots of really usefull info on everything before we dig > deeper into your issue > > > and then more specifically run > host% ompi_info --param coll all
Find attached ~/notes from $ ( ompi_info; echo '====================='; ompi_info --param coll all ) >~/notes Thanks in advance and kind regards, -- Frank Gruellich HPC-Techniker Tel.: +49 3722 528 42 Fax: +49 3722 528 15 E-Mail: frank.gruell...@megware.com MEGWARE Computer GmbH Vertrieb und Service Nordstrasse 19 09247 Chemnitz/Roehrsdorf Germany http://www.megware.com/
Open MPI: 1.1b1 Open MPI SVN revision: r10217 Open RTE: 1.1b1 Open RTE SVN revision: r10217 OPAL: 1.1b1 OPAL SVN revision: r10217 Prefix: /usr/ofed/mpi/intel/openmpi-1.1b1-1 Configured architecture: x86_64-suse-linux-gnu Configured by: root Configured on: Wed Jul 19 20:51:46 CEST 2006 Configure host: frontend Built by: root Built on: Wed Jul 19 21:04:47 CEST 2006 Built host: frontend C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /software/intel/cce/9.1.038/bin/icc C++ compiler: icpc C++ compiler absolute: /software/intel/cce/9.1.038/bin/icpc Fortran77 compiler: ifort Fortran77 compiler abs: /software/intel/fce/9.1.032/bin/ifort Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.1) MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.1) MCA mpool: openib (MCA v1.0, API v1.0, Component v1.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1) MCA pml: dr (MCA v1.0, API v1.0, Component v1.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1) MCA btl: openib (MCA v1.0, API v1.0, Component v1.1) MCA btl: self (MCA v1.0, API v1.0, Component v1.1) MCA btl: sm (MCA v1.0, API v1.0, Component v1.1) MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0) MCA gpr: null (MCA v1.0, API v1.0, Component v1.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1) MCA iof: svc (MCA v1.0, API v1.0, Component v1.1) MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1) MCA ns: replica (MCA v1.0, API v1.0, Component v1.1) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1) MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1) MCA ras: slurm (MCA v1.0, API v1.0, Component v1.1) MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1) MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1) MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1) MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.1) MCA pls: fork (MCA v1.0, API v1.0, Component v1.1) MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1) MCA pls: slurm (MCA v1.0, API v1.0, Component v1.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.1) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.1) ===================== MCA coll: parameter "coll" (current value: <none>) Default selection set of components for the coll framework (<none> means "use all components that can be found") MCA coll: parameter "coll_base_verbose" (current value: "0") Verbosity level for the coll framework (0 = no verbosity) MCA coll: parameter "coll_basic_priority" (current value: "10") Priority of the basic coll component MCA coll: parameter "coll_basic_crossover" (current value: "4") Minimum number of processes in a communicator before using the logarithmic algorithms MCA coll: parameter "coll_hierarch_priority" (current value: "0") Priority of the hierarchical coll component MCA coll: parameter "coll_hierarch_verbose" (current value: "0") Turn verbose message of the hierarchical coll component on/off MCA coll: parameter "coll_hierarch_use_rdma" (current value: "0") Switch from the send btl list used to detect hierarchies to the rdma btl list MCA coll: parameter "coll_hierarch_ignore_sm" (current value: "0") Ignore sm protocol when detecting hierarchies. Required to enable the usage of protocol specific collective operations MCA coll: parameter "coll_hierarch_symmetric" (current value: "0") Assume symmetric configuration MCA coll: parameter "coll_self_priority" (current value: "75") MCA coll: parameter "coll_sm_priority" (current value: "0") Priority of the sm coll component MCA coll: parameter "coll_sm_control_size" (current value: "4096") Length of the control data -- should usually be either the length of a cache line on most SMPs, or the size of a page on machines that support direct memory affinity page placement (in bytes) MCA coll: parameter "coll_sm_bootstrap_filename" (current value: "coll-sm-bootstrap") Filename (in the Open MPI session directory) of the coll sm component bootstrap rendezvous mmap file MCA coll: parameter "coll_sm_bootstrap_num_segments" (current value: "8") Number of segments in the bootstrap file MCA coll: parameter "coll_sm_fragment_size" (current value: "8192") Fragment size (in bytes) used for passing data through shared memory (will be rounded up to the nearest control_size size) MCA coll: parameter "coll_sm_mpool" (current value: "sm") Name of the mpool component to use MCA coll: parameter "coll_sm_comm_in_use_flags" (current value: "2") Number of "in use" flags, used to mark a message passing area segment as currently being used or not (must be >= 2 and <= comm_num_segments) MCA coll: parameter "coll_sm_comm_num_segments" (current value: "8") Number of segments in each communicator's shared memory message passing area (must be >= 2, and must be a multiple of comm_in_use_flags) MCA coll: parameter "coll_sm_tree_degree" (current value: "4") Degree of the tree for tree-based operations (must be => 1 and <= min(control_size, 255)) MCA coll: information "coll_sm_shared_mem_used_bootstrap" (value: "216") Amount of shared memory used in the shared memory bootstrap area (in bytes) MCA coll: parameter "coll_sm_info_num_procs" (current value: "4") Number of processes to use for the calculation of the shared_mem_size MCA information parameter (must be => 2) MCA coll: information "coll_sm_shared_mem_used_data" (value: "548864") Amount of shared memory used in the shared memory data area for info_num_procs processes (in bytes) MCA coll: parameter "coll_tuned_priority" (current value: "30") Priority of the tuned coll component MCA coll: parameter "coll_tuned_pre_allocate_memory_comm_size_limit" (current value: "32768") Size of communicator were we stop pre-allocating memory for the fixed internal buffer used for message requests etc that is hung off the communicator data segment. I.e. if you have a 100'000 nodes you might not want to pre-allocate 200'000 request handle slots per communicator instance! MCA coll: parameter "coll_tuned_use_dynamic_rules" (current value: "0") Switch used to decide if we use static (if statements) or dynamic (built at runtime) decision function rules MCA coll: parameter "coll_tuned_init_tree_fanout" (current value: "4") Inital fanout used in the tree topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little time MCA coll: parameter "coll_tuned_init_chain_fanout" (current value: "4") Inital fanout used in the chain (fanout followed by pipeline) topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little time MCA coll: parameter "coll_tuned_allreduce_algorithm" (current value: "0") Which allreduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 nonoverlapping (tuned reduce + tuned bcast) MCA coll: parameter "coll_tuned_allreduce_algorithm_segmentsize" (current value: "0") Segment size in bytes used by default for allreduce algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation. MCA coll: parameter "coll_tuned_allreduce_algorithm_tree_fanout" (current value: "4") Fanout for n-tree used for allreduce algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation. MCA coll: parameter "coll_tuned_allreduce_algorithm_chain_fanout" (current value: "4") Fanout for chains used for allreduce algorithms. Only has meaning if algorithm is forced and supports chain topo based operation. MCA coll: parameter "coll_tuned_alltoall_algorithm" (current value: "0") Which alltoall algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 pairwise, 3: modified bruck, 4: two proc only. MCA coll: parameter "coll_tuned_alltoall_algorithm_segmentsize" (current value: "0") Segment size in bytes used by default for alltoall algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation. MCA coll: parameter "coll_tuned_alltoall_algorithm_tree_fanout" (current value: "4") Fanout for n-tree used for alltoall algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation. MCA coll: parameter "coll_tuned_alltoall_algorithm_chain_fanout" (current value: "4") Fanout for chains used for alltoall algorithms. Only has meaning if algorithm is forced and supports chain topo based operation. MCA coll: parameter "coll_tuned_barrier_algorithm" (current value: "0") Which barrier algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 double ring, 3: recursive doubling 4: bruck, 5: two proc only, 6: step based bmtree MCA coll: parameter "coll_tuned_bcast_algorithm" (current value: "0") Which bcast algorithm is used. Can be locked down to choice of: 0 ignore, 1 basic linear, 2 chain, 3: pipeline, 4: split binary tree, 5: binary tree, 6: BM tree. MCA coll: parameter "coll_tuned_bcast_algorithm_segmentsize" (current value: "0") Segment size in bytes used by default for bcast algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation. MCA coll: parameter "coll_tuned_bcast_algorithm_tree_fanout" (current value: "4") Fanout for n-tree used for bcast algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation. MCA coll: parameter "coll_tuned_bcast_algorithm_chain_fanout" (current value: "4") Fanout for chains used for bcast algorithms. Only has meaning if algorithm is forced and supports chain topo based operation. MCA coll: parameter "coll_tuned_reduce_algorithm" (current value: "0") Which reduce algorithm is used. Can be locked down to choice of: 0 ignore, 1 linear, 2 chain, 3 pipeline MCA coll: parameter "coll_tuned_reduce_algorithm_segmentsize" (current value: "0") Segment size in bytes used by default for reduce algorithms. Only has meaning if algorithm is forced and supports segmenting. 0 bytes means no segmentation. MCA coll: parameter "coll_tuned_reduce_algorithm_tree_fanout" (current value: "4") Fanout for n-tree used for reduce algorithms. Only has meaning if algorithm is forced and supports n-tree topo based operation. MCA coll: parameter "coll_tuned_reduce_algorithm_chain_fanout" (current value: "4") Fanout for chains used for reduce algorithms. Only has meaning if algorithm is forced and supports chain topo based operation.