Are you able to run if you use --mca btl_openib_cpc_include rdmacm ?

On Mar 17, 2011, at 10:57 AM, Craig West wrote:

> Hi,
> I'm a system administrator trying to help users resolve gadget 2 code hangs 
> doing MPI_Sendrecv (similar to 
> http://www.open-mpi.org/community/lists/users/2010/05/13057.php).
> I'm trying to determine appropriate values for mpool_rdma_rcache_size_limit 
> for our hardware, and to make sure RDMA settings are appropriate and do not 
> lead to data corruption 
> (http://www.open-mpi.org/faq/?category=openfabrics#setting-mpi-leave-pinned-1.3.2).
> The gadget code was running fine under openmpi 1.2.9 and the hangs showed up 
> in 1.4.3 (actually also 1.3.2). 
> 
> code runs using tcp (-mca btl tcp,self,sm)
> 
> code hangs using infiniband 
> 
> code runs using infiniband with "-mca btl_openib_flags 1" and "-mca 
> mpool_rdma_rcache_size_limit 209715200" (suggestion from poster from the 
> referenced link above)
> 
> Any suggestions would be appreciated.
> Regards,
> Gretchen
> 0. openmpi 1.4.3 (ompi_info attached, config.log is missing but may not be 
> needed as this is a more general usage/settings question)
> 1. OFED 1.4.2 from git.openfabrics.org
> 2. Debian 5.0, kernel 2.6.26-2-amd64
> 3. opensm-3.2.6
> 4. ibv_devinfo
> hca_id:    mlx4_0
>     fw_ver:                2.6.000
>     node_guid:            0002:c903:0002:848c
>     sys_image_guid:            0002:c903:0002:848f
>     vendor_id:            0x02c9
>     vendor_part_id:            25408
>     hw_ver:                0xA0
>     board_id:            MT_04A0130005
>     phys_port_cnt:            2
>         port:    1
>             state:            PORT_ACTIVE (4)
>             max_mtu:        2048 (4)
>             active_mtu:        2048 (4)
>             sm_lid:            30
>             port_lid:        99
>             port_lmc:        0x00
> 
> 5. ifconfig
> ib0       Link encap:UNSPEC  HWaddr 
> 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
>           inet addr:10.16.10.20  Bcast:10.16.10.255  Mask:255.255.255.0
>           inet6 addr: fe80::202:c903:2:848d/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
>           RX packets:1936 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:0 errors:0 dropped:5 overruns:0 carrier:0
>           collisions:0 txqueuelen:256 
>           RX bytes:189055 (184.6 KiB)  TX bytes:0 (0.0 B)
> 6. unlimited
> 
> 
> 
> 
> <ompi_info.txt>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to