I had exactly the same problem. Trying to run mpi between 2 separate machines, with each machine having 2 ethernet ports, causes really weird behaviour on the most basic code. I had to disable one of the ethernet ports on each of the machines and it worked just fine after that. No idea why though !
----- Original Message ----- From: Jingcha Joba To: us...@open-mpi.org Sent: Thursday, February 16, 2012 8:43 PM Subject: [OMPI users] Problem running an mpi application on nodes with more than one interface Hello Everyone, This is my 1st post in open-mpi forum. I am trying to run a simple program which does Sendrecv between two nodes having 2 interface cards on each of two nodes. Both the nodes are running RHEL6, with open-mpi 1.4.4 on a 8 core Xeon processor. What I noticed was that when using two or more interface on both the nodes, the mpi "hangs" attempting to connect. These details might help, Node 1 - Denver has a single port "A" card (eth21 - 25.192.xx.xx - which I use to ssh to that machine), and a double port "B" card (eth23 - 10.3.1.1 & eth24 - 10.3.1.2). Node 2 - Chicago also the same single port A card (eth19 - 25.192.xx.xx - again uses for ssh) and a double port B card ( eth29 - 10.3.1.3 & eth30 - 10.3.1.4). My /etc/host looks like 25.192.xx.xx denver.xxx.com denver 10.3.1.1 denver.xxx.com denver 10.3.1.2 denver.xxx.com denver 25.192.xx.xx chicago.xxx.com chicago 10.3.1.3 chicago.xxx.com chicago 10.3.1.4 chicago.xxx.com chicago ... ... ... This is how I run, mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv I get bunch of things from both chicago and denver, which says its has found components like tcp, sm, self and stuffs, and then hangs at [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.3 on port 4 [denver.xxx.com:21682] btl: tcp: attempting to connect() to address 10.3.1.4 on port 4 However, if I run the same program by excluding eth29 or eth30, then it works fine. Something like this: mpirun --hostfile host1 --mca btl tcp,sm,self --mca btl_tcp_if_exclude eth21,eth19,eth29,lo,virbr0 --mca btl_base_verbose 30 -np 4 ./Sendrecv My hostfile looks like this [sshuser@denver Sendrecv]$ cat host1 denver slots=2 chicago slots=2 I am not sure if I have to provide somethbing else. Please if I have to, please feel to ask me.. thanks, -- Joba ------------------------------------------------------------------------------ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users