Matt,
I guess that you have some problem with partition configuration.
Can you share with us your partition configuration file (by default
opensm use /etc/opensm/partitions.conf) and guid from your machines (
ibstat | grep GUID ) ?
Regards,
Pasha
Matt Burgess wrote:
Hi,
I'm trying to get openmpi working over openib partitions. On this
cluster, the partition number is 0x109. The ib interfaces are pingable
over the appropriate ib0.8109 interface:
d2:/opt/openmpi-ib # ifconfig ib0.8109
ib0.8109 Link encap:UNSPEC HWaddr
80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.21.48.2 <http://10.21.48.2>
Bcast:10.21.255.255 <http://10.21.255.255> Mask:255.255.0.0
<http://255.255.0.0>
inet6 addr: fe80::202:c902:26:ca01/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:16811 errors:0 dropped:0 overruns:0 frame:0
TX packets:15848 errors:0 dropped:1 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:102229428 (97.4 Mb) TX bytes:102324172 (97.5 Mb)
I have tried the following:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca
btl openib,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val
0x8109 -mca btl_openib_ib_pkey_ix 1 /cluster/pallas/x86_64-ib/IMB-MPI1
but I just get a RETRY EXCEEDED ERROR. Is there a MCA parameter I am
missing?
I was successful using tcp only:
/opt/openmpi-ib/1.2.6/bin/mpirun -np 2 -machinefile machinefile -mca
btl tcp,self -mca btl_openib_max_btls 1 -mca btl_openib_ib_pkey_val
0x8109 /cluster/pallas/x86_64-ib/IMB-MPI1
Thanks,
Matt Burgess
------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
--
Pavel Shamis (Pasha)
Mellanox Technologies LTD.