On Sep 12, 2011, at 10:16 AM, Blosch, Edwin L wrote:

> Samuel,
>  
> This worked. 

Great!

> Did this magic line disable the use of per-peer queue pairs? 

Yes, it sure did.

> I have seen a previous post by Jeff that explains what this line does 
> generally, but I didn’t study the post in detail, so if you could provide a 
> little explanation I would appreciate it.


The btl_openib_receive_queues MCA parameter takes a comma and colon delimited 
list of parameters.  The most general form is:

Queue pair type identifier,Buffer size (Required),Number of buffers 
(Required):Queue pair type identifier,Buffer size (Required),Number of buffers 
(Required):...:Queue pair type identifier,Buffer size (Required),Number of 
buffers (Required)

NOTE: Optional QP parameters omitted - see the OMPI FAQ regarding OpenFabrics 
tuning for more details.

The 'S' queue pair type identifier corresponds to "Shared queues."
The 'P' queue pair type identifier corresponds to "Per-peer queues."

Hope that helps,

Sam

>  
> Ed
>  
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Samuel K. Gutierrez
> Sent: Monday, September 12, 2011 10:49 AM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] qp memory allocation problem
>  
> Hi,
>  
> This problem can be  caused by a variety of things, but I suspect our default 
> queue pair parameters (QP) aren't helping the situation :-).
>  
> What happens when you add the following to your mpirun command?
>  
> -mca btl_openib_receive_queues S,4096,128:S,12288,128:S,65536,12
>  
> OMPI Developers:
>  
> Maybe we should consider disabling the use of per-peer queue pairs by 
> default.  Do they buy us anything?  For what it is worth, we have stopped 
> using them on all of our large systems here at LANL.
>  
> Thanks,
>  
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>  
> On Sep 12, 2011, at 9:23 AM, Blosch, Edwin L wrote:
> 
> 
> I am getting this error message below and I don’t know what it means or how 
> to fix it.   It only happens when I run on a large number of processes, e.g. 
> 960.  Things work fine on 480, and I don’t think the application has a bug.  
> Any help is appreciated…
>  
> [c1n01][[30697,1],3][connect/btl_openib_connect_oob.c:464:qp_create_one] 
> error creating qp errno says Cannot allocate memory
> [c1n01][[30697,1],4][connect/btl_openib_connect_oob.c:464:qp_create_one] 
> error creating qp errno says Cannot allocate memory
>  
> Here’s the mpirun command I used:
> mpirun --prefix /usr/mpi/intel/openmpi-1.4.3 --machinefile <host file> -np 
> 960 --mca btl ^tcp --mca mpool_base_use_mem_hooks 1 --mca mpi_leave_pinned 1 
> -x LD_LIBRARY_PATH -x MPI_ENVIRONMENT=1 <application name + arguments>
>  
> Here’s the applicable hardware from the 
> /usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini 
> file:
> # A.k.a. ConnectX
> [Mellanox Hermon]
> vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
> vendor_part_id = 
> 25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
> use_eager_rdma = 1
> mtu = 2048
> max_inline_data = 128
>  
> And this is the output of ompi_info –param btl openib:
>                  MCA btl: parameter "btl_base_verbose" (current value: "0", 
> data source: default value)
>                           Verbosity level of the BTL framework
>                  MCA btl: parameter "btl" (current value: <none>, data 
> source: default value)
>                           Default selection set of components for the btl 
> framework (<none> means use all components that can be found)
>                  MCA btl: parameter "btl_openib_verbose" (current value: "0", 
> data source: default value)
>                           Output some verbose OpenIB BTL information (0 = no 
> output, nonzero = output)
>                  MCA btl: parameter "btl_openib_warn_no_device_params_found" 
> (current value: "1", data source: default value, synonyms:
>                           btl_openib_warn_no_hca_params_found)
>                           Warn when no device-specific parameters are found 
> in the INI file specified by the btl_openib_device_param_files MCA
>                           parameter (0 = do not warn; any other value = warn)
>                  MCA btl: parameter "btl_openib_warn_no_hca_params_found" 
> (current value: "1", data source: default value, deprecated, synonym
>                           of: btl_openib_warn_no_device_params_found)
>                           Warn when no device-specific parameters are found 
> in the INI file specified by the btl_openib_device_param_files MCA
>                           parameter (0 = do not warn; any other value = warn)
>                  MCA btl: parameter "btl_openib_warn_default_gid_prefix" 
> (current value: "1", data source: default value)
>                           Warn when there is more than one active ports and 
> at least one of them connected to the network with only default GID
>                           prefix configured (0 = do not warn; any other value 
> = warn)
>                  MCA btl: parameter "btl_openib_warn_nonexistent_if" (current 
> value: "1", data source: default value)
>                           Warn if non-existent devices and/or ports are 
> specified in the btl_openib_if_[in|ex]clude MCA parameters (0 = do not
>                           warn; any other value = warn)
>                  MCA btl: parameter "btl_openib_want_fork_support" (current 
> value: "-1", data source: default value)
>                           Whether fork support is desired or not (negative = 
> try to enable fork support, but continue even if it is not
>                           available, 0 = do not enable fork support, positive 
> = try to enable fork support and fail if it is not available)
>                  MCA btl: parameter "btl_openib_device_param_files" (current 
> value:
>                           
> "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini",
>  data source: default value, synonyms:
>                           btl_openib_hca_param_files)
>                           Colon-delimited list of INI-style files that 
> contain device vendor/part-specific parameters
>                  MCA btl: parameter "btl_openib_hca_param_files" (current 
> value:
>                           
> "/usr/mpi/intel/openmpi-1.4.3/share/openmpi/mca-btl-openib-device-params.ini",
>  data source: default value, deprecated,
>                           synonym of: btl_openib_device_param_files)
>                           Colon-delimited list of INI-style files that 
> contain device vendor/part-specific parameters
>                  MCA btl: parameter "btl_openib_device_type" (current value: 
> "all", data source: default value)
>                           Specify to only use IB or iWARP network adapters 
> (infiniband = only use InfiniBand HCAs; iwarp = only use iWARP NICs;
>                           all = use any available adapters)
>                  MCA btl: parameter "btl_openib_max_btls" (current value: 
> "-1", data source: default value)
>                           Maximum number of device ports to use (-1 = use all 
> available, otherwise must be >= 1)
>                  MCA btl: parameter "btl_openib_free_list_num" (current 
> value: "8", data source: default value)
>                           Intial size of free lists (must be >= 1)
>                  MCA btl: parameter "btl_openib_free_list_max" (current 
> value: "-1", data source: default value)
>                           Maximum size of free lists (-1 = infinite, 
> otherwise must be >= 0)
>                  MCA btl: parameter "btl_openib_free_list_inc" (current 
> value: "32", data source: default value)
>                           Increment size of free lists (must be >= 1)
>                  MCA btl: parameter "btl_openib_mpool" (current value: 
> "rdma", data source: default value)
>                           Name of the memory pool to be used (it is unlikely 
> that you will ever want to change this
>                  MCA btl: parameter "btl_openib_reg_mru_len" (current value: 
> "16", data source: default value)
>                           Length of the registration cache most recently used 
> list (must be >= 1)
>                  MCA btl: parameter "btl_openib_cq_size" (current value: 
> "1000", data source: default value, synonyms: btl_openib_ib_cq_size)
>                           Size of the OpenFabrics completion queue (will 
> automatically be set to a minimum of (2 * number_of_peers *
>                           btl_openib_rd_num))
>                  MCA btl: parameter "btl_openib_ib_cq_size" (current value: 
> "1000", data source: default value, deprecated, synonym of:
>                           btl_openib_cq_size)
>                           Size of the OpenFabrics completion queue (will 
> automatically be set to a minimum of (2 * number_of_peers *
>                           btl_openib_rd_num))
>                  MCA btl: parameter "btl_openib_max_inline_data" (current 
> value: "-1", data source: default value, synonyms:
>                           btl_openib_ib_max_inline_data)
>                           Maximum size of inline data segment (-1 = run-time 
> probe to discover max value, otherwise must be >= 0). If not
>                           explicitly set, use max_inline_data from the INI 
> file containing device-specific parameters
>                  MCA btl: parameter "btl_openib_ib_max_inline_data" (current 
> value: "-1", data source: default value, deprecated, synonym of:
>                           btl_openib_max_inline_data)
>                           Maximum size of inline data segment (-1 = run-time 
> probe to discover max value, otherwise must be >= 0). If not
>                           explicitly set, use max_inline_data from the INI 
> file containing device-specific parameters
>                  MCA btl: parameter "btl_openib_pkey" (current value: "0", 
> data source: default value, synonyms: btl_openib_ib_pkey_val)
>                           OpenFabrics partition key (pkey) value. Unsigned 
> integer decimal or hex values are allowed (e.g., "3" or "0x3f") and
>                           will be masked against the maximum allowable IB 
> paritition key value (0x7fff)
>                  MCA btl: parameter "btl_openib_ib_pkey_val" (current value: 
> "0", data source: default value, deprecated, synonym of:
>                           btl_openib_pkey)
>                           OpenFabrics partition key (pkey) value. Unsigned 
> integer decimal or hex values are allowed (e.g., "3" or "0x3f") and
>                           will be masked against the maximum allowable IB 
> paritition key value (0x7fff)
>                  MCA btl: parameter "btl_openib_psn" (current value: "0", 
> data source: default value, synonyms: btl_openib_ib_psn)
>                           OpenFabrics packet sequence starting number (must 
> be >= 0)
>                  MCA btl: parameter "btl_openib_ib_psn" (current value: "0", 
> data source: default value, deprecated, synonym of:
>                           btl_openib_psn)
>                           OpenFabrics packet sequence starting number (must 
> be >= 0)
>                  MCA btl: parameter "btl_openib_ib_qp_ous_rd_atom" (current 
> value: "4", data source: default value)
>                           InfiniBand outstanding atomic reads (must be >= 0)
>                  MCA btl: parameter "btl_openib_mtu" (current value: "3", 
> data source: default value, synonyms: btl_openib_ib_mtu)
>                           OpenFabrics MTU, in bytes (if not specified in INI 
> files).  Valid values are: 1=256 bytes, 2=512 bytes, 3=1024 bytes,
>                           4=2048 bytes, 5=4096 bytes
>                  MCA btl: parameter "btl_openib_ib_mtu" (current value: "3", 
> data source: default value, deprecated, synonym of:
>                           btl_openib_mtu)
>                           OpenFabrics MTU, in bytes (if not specified in INI 
> files).  Valid values are: 1=256 bytes, 2=512 bytes, 3=1024 bytes,
>                           4=2048 bytes, 5=4096 bytes
>                  MCA btl: parameter "btl_openib_ib_min_rnr_timer" (current 
> value: "25", data source: default value)
>                           InfiniBand minimum "receiver not ready" timer, in 
> seconds (must be >= 0 and <= 31)
>                  MCA btl: parameter "btl_openib_ib_timeout" (current value: 
> "20", data source: default value)
>                           InfiniBand transmit timeout, plugged into formula: 
> 4.096 microseconds * (2^btl_openib_ib_timeout)(must be >= 0 and <=
>                           31)
>                  MCA btl: parameter "btl_openib_ib_retry_count" (current 
> value: "7", data source: default value)
>                           InfiniBand transmit retry count (must be >= 0 and 
> <= 7)
>                  MCA btl: parameter "btl_openib_ib_rnr_retry" (current value: 
> "7", data source: default value)
>                           InfiniBand "receiver not ready" retry count; 
> applies *only* to SRQ/XRC queues.  PP queues use RNR retry values of 0
>                           because Open MPI performs software flow control to 
> guarantee that RNRs never occur (must be >= 0 and <= 7; 7 =
>                           "infinite")
>                  MCA btl: parameter "btl_openib_ib_max_rdma_dst_ops" (current 
> value: "4", data source: default value)
>                           InfiniBand maximum pending RDMA destination 
> operations (must be >= 0)
>                  MCA btl: parameter "btl_openib_ib_service_level" (current 
> value: "0", data source: default value)
>                           InfiniBand service level (must be >= 0 and <= 15)
>                  MCA btl: parameter "btl_openib_use_eager_rdma" (current 
> value: "-1", data source: default value)
>                           Use RDMA for eager messages (-1 = use device 
> default, 0 = do not use eager RDMA, 1 = use eager RDMA)
>                  MCA btl: parameter "btl_openib_eager_rdma_threshold" 
> (current value: "16", data source: default value)
>                           Use RDMA for short messages after this number of 
> messages are received from a given peer (must be >= 1)
>                  MCA btl: parameter "btl_openib_max_eager_rdma" (current 
> value: "16", data source: default value)
>                           Maximum number of peers allowed to use RDMA for 
> short messages (RDMA is used for all long messages, except if
>                           explicitly disabled, such as with the "dr" pml) 
> (must be >= 0)
>                  MCA btl: parameter "btl_openib_eager_rdma_num" (current 
> value: "16", data source: default value)
>                           Number of RDMA buffers to allocate for small 
> messages(must be >= 1)
>                  MCA btl: parameter "btl_openib_btls_per_lid" (current value: 
> "1", data source: default value)
>                           Number of BTLs to create for each InfiniBand LID 
> (must be >= 1)
>                  MCA btl: parameter "btl_openib_max_lmc" (current value: "0", 
> data source: default value)
>                           Maximum number of LIDs to use for each device port 
> (must be >= 0, where 0 = use all available)
>                  MCA btl: parameter "btl_openib_enable_apm_over_lmc" (current 
> value: "0", data source: default value)
>                           Maximum number of alterative paths for each device 
> port (must be >= -1, where 0 = disable apm, -1 = all availible
>                           alternative paths )
>                  MCA btl: parameter "btl_openib_enable_apm_over_ports" 
> (current value: "0", data source: default value)
>                           Enable alterative path migration (APM) over 
> different ports of the same device (must be >= 0, where 0 = disable APM
>                           over ports , 1 = enable APM over ports of the same 
> device)
>                  MCA btl: parameter "btl_openib_use_async_event_thread" 
> (current value: "1", data source: default value)
>                           If nonzero, use the thread that will handle 
> InfiniBand asyncihronous events
>                  MCA btl: parameter "btl_openib_buffer_alignment" (current 
> value: "64", data source: default value)
>                           Prefered communication buffer alignment, in bytes 
> (must be > 0 and power of two)
>                  MCA btl: parameter "btl_openib_use_message_coalescing" 
> (current value: "1", data source: default value)
>                           Use message coalescing
>                  MCA btl: parameter "btl_openib_cq_poll_ratio" (current 
> value: "100", data source: default value)
>                           how often poll high priority CQ versus low priority 
> CQ
>                  MCA btl: parameter "btl_openib_eager_rdma_poll_ratio" 
> (current value: "100", data source: default value)
>                           how often poll eager RDMA channel versus CQ
>                  MCA btl: parameter "btl_openib_hp_cq_poll_per_progress" 
> (current value: "10", data source: default value)
>                           max number of completion events to process for each 
> call of BTL progress engine
>                  MCA btl: information "btl_openib_have_fork_support" (value: 
> "1", data source: default value)
>                           Whether the OpenFabrics stack supports applications 
> that invoke the "fork()" system call or not (0 = no, 1 = yes).
>                           Note that this value does NOT indicate whether the 
> system being run on supports "fork()" with OpenFabrics applications
>                           or not.
>                  MCA btl: parameter "btl_openib_exclusivity" (current value: 
> "1024", data source: default value)
>                           BTL exclusivity (must be >= 0)
>                  MCA btl: parameter "btl_openib_flags" (current value: "310", 
> data source: default value)
>                           BTL bit flags (general flags: SEND=1, PUT=2, GET=4, 
> SEND_INPLACE=8, RDMA_MATCHED=64, HETEROGENEOUS_RDMA=256; flags
>                           only used by the "dr" PML (ignored by others): 
> ACK=16, CHECKSUM=32, RDMA_COMPLETION=128)
>                  MCA btl: parameter "btl_openib_rndv_eager_limit" (current 
> value: "12288", data source: default value)
>                           Size (in bytes) of "phase 1" fragment sent for all 
> large messages (must be >= 0 and <= eager_limit)
>                  MCA btl: parameter "btl_openib_eager_limit" (current value: 
> "12288", data source: default value)
>                           Maximum size (in bytes) of "short" messages (must 
> be >= 1).
>                  MCA btl: parameter "btl_openib_max_send_size" (current 
> value: "65536", data source: default value)
>                           Maximum size (in bytes) of a single "phase 2" 
> fragment of a long message when using the pipeline protocol (must be >=
>                           1)
>                  MCA btl: parameter "btl_openib_rdma_pipeline_send_length" 
> (current value: "1048576", data source: default value)
>                           Length of the "phase 2" portion of a large message 
> (in bytes) when using the pipeline protocol.  This part of the
>                           message will be split into fragments of size 
> max_send_size and sent using send/receive semantics (must be >= 0; only
>                           relevant when the PUT flag is set)
>                  MCA btl: parameter "btl_openib_rdma_pipeline_frag_size" 
> (current value: "1048576", data source: default value)
>                           Maximum size (in bytes) of a single "phase 3" 
> fragment from a long message when using the pipeline protocol.  These
>                           fragments will be sent using RDMA semantics (must 
> be >= 1; only relevant when the PUT flag is set)
>                  MCA btl: parameter "btl_openib_min_rdma_pipeline_size" 
> (current value: "262144", data source: default value)
>                           Messages smaller than this size (in bytes) will not 
> use the RDMA pipeline protocol.  Instead, they will be split into
>                           fragments of max_send_size and sent using 
> send/receive semantics (must be >=0, and is automatically adjusted up to at
>                           least (eager_limit+btl_rdma_pipeline_send_length); 
> only relevant when the PUT flag is set)
>                  MCA btl: parameter "btl_openib_bandwidth" (current value: 
> "800", data source: default value)
>                           Approximate maximum bandwidth of interconnect(must 
> be >= 1)
>                  MCA btl: parameter "btl_openib_latency" (current value: 
> "10", data source: default value)
>                           Approximate latency of interconnect (must be >= 0)
>                  MCA btl: parameter "btl_openib_receive_queues" (current 
> value:
>                           
> "P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32", 
> data source: default value)
>                           Colon-delimited, comma delimited list of receive 
> queues: P,4096,8,6,4:P,32768,8,6,4
>                  MCA btl: parameter "btl_openib_if_include" (current value: 
> <none>, data source: default value)
>                           Comma-delimited list of devices/ports to be used 
> (e.g. "mthca0,mthca1:2"; empty value means to use all ports found).
>                           Mutually exclusive with btl_openib_if_exclude.
>                  MCA btl: parameter "btl_openib_if_exclude" (current value: 
> <none>, data source: default value)
>                           Comma-delimited list of device/ports to be excluded 
> (empty value means to not exclude any ports).  Mutually exclusive
>                           with btl_openib_if_include.
>                  MCA btl: parameter "btl_openib_ipaddr_include" (current 
> value: <none>, data source: default value)
>                           Comma-delimited list of IP Addresses to be used 
> (e.g. "192.168.1.0/24").  Mutually exclusive with
>                           btl_openib_ipaddr_exclude.
>                  MCA btl: parameter "btl_openib_ipaddr_exclude" (current 
> value: <none>, data source: default value)
>                           Comma-delimited list of IP Addresses to be excluded 
> (e.g. "192.168.1.0/24").  Mutually exclusive with
>                           btl_openib_ipaddr_include.
>                  MCA btl: parameter "btl_openib_cpc_include" (current value: 
> <none>, data source: default value)
>                           Method used to select OpenFabrics connections 
> (valid values: oob,xoob,rdmacm)
>                  MCA btl: parameter "btl_openib_cpc_exclude" (current value: 
> <none>, data source: default value)
>                          Method used to exclude OpenFabrics connections 
> (valid values: oob,xoob,rdmacm)
>                  MCA btl: parameter "btl_openib_connect_oob_priority" 
> (current value: "50", data source: default value)
>                           The selection method priority for oob
>                  MCA btl: parameter "btl_openib_connect_xoob_priority" 
> (current value: "60", data source: default value)
>                           The selection method priority for xoob
>                  MCA btl: parameter "btl_openib_connect_rdmacm_priority" 
> (current value: "30", data source: default value)
>                           The selection method priority for rdma_cm
>                  MCA btl: parameter "btl_openib_connect_rdmacm_port" (current 
> value: "0", data source: default value)
>                           The selection method port for rdma_cm
>                  MCA btl: parameter 
> "btl_openib_connect_rdmacm_resolve_timeout" (current value: "1000", data 
> source: default value)
>                           The timeout (in miliseconds) for address and route 
> resolution
>                  MCA btl: parameter "btl_openib_connect_rdmacm_retry_count" 
> (current value: "20", data source: default value)
>                           Maximum number of times rdmacm will retry route 
> resolution
>                  MCA btl: parameter 
> "btl_openib_connect_rdmacm_reject_causes_connect_error" (current value: "0", 
> data source: default value)
>                           The drivers for some devices are buggy such that an 
> RDMA REJECT action may result in a CONNECT_ERROR event instead of
>                           a REJECTED event.  Setting this MCA parameter to 
> true tells Open MPI to treat CONNECT_ERROR events on connections
>                           where a REJECT is expected as a REJECT (default: 
> false)
>                  MCA btl: parameter "btl_openib_priority" (current value: 
> "0", data source: default value)
>                  MCA btl: parameter "btl_base_warn_component_unused" (current 
> value: "1", data source: default value)
>                           This parameter is used to turn on warning messages 
> when certain NICs are not used
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>  
>  
> 
> 
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to