Victor -

I don't think our multi-port support with MX is particularly well tested (I 
know I don't test that path).

It looks like you might be able to work around the problem by setting -mca 
mtl_mx_endpoint_num 1 on the mpirun command line, which will only use the first 
port found.  But I could be wrong.

Brian

On 1/9/14 5:02 PM, "Victor Prosolin" 
<victor.proso...@rwdi.com<mailto:victor.proso...@rwdi.com>> wrote:

H,
Our cluster has servers with either a single port or a dual port Myrinet card. 
In case of a dual card, only one port is connected to the Myrinet switch. The 
OpenMPI library is configured with “--with-mx=…” option and it works fine when 
I submit jobs to single port servers only. However, when I try to include a 
server with a dual port card, I get a bunch of errors like the following:
[compute-08:17788] mx_connect fail for unknown 60dd464f9d nic_id with key 
aaaaffff (error Destination NIC not found in network table)

60dd464f9d is the wrong MAC address corresponding to port 1 (not connected) 
when port 0 is connected to the switch and has MAC  60dd464f9c.

This is how (try to) I run the job:

1.       mpiexec -np 32 -host compute-08,compute-17,compute-18,compute-16 -mca 
mtl mx --mca pml cm ./wrf.exe
or

2.       Using a similar command but via Sun Grid Engine.

The OS is Centos 6.4, 64bit. OpenMPI 1.6.5 compiled from the official src rpm 
with gcc 4.4.7, MX library 1.2.16 manually compiled. Again, this configuration 
works fine when only single port servers are used.

Is there a way to tell OpenMPI to stick to the one port that is connected? I 
haven’t found any options through ompi_info or via google… Any help will be 
greatly appreciated.

Sincerely,
Victor.



--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories

Reply via email to