(sorry for the delay in this reply; this mail came while I was at the MPI Forum meeting. Travel always makes my disastrous INBOX even worse...)
As a bit of explanation, I can surmise part of what is happening here. When you run on only one machine, the TCP communications plugin (i.e., the "BTL") is not used -- only the shared memory (sm) BTL is used. Hence, you don't see the warnings. That being said, you could force the TCP BTL to be used instead of the sm BTL by using: mpirun --mca btl tcp,self -np 2 my_test_program When you run across multiple nodes, the TCP BTL is used by default. And therefore these warnings come up. These warnings refer to IP interfaces that Open MPI found that it doesn't recognize. What is the output of ifconfig on your machine? On Jan 16, 2012, at 9:11 PM, Hamilton Fischer wrote: > > ----- Forwarded Message ----- > From: Hamilton Fischer <fischerhamil...@yahoo.com> > To: "u...@open-mpi.org" <u...@open-mpi.org> > Sent: Monday, January 16, 2012 9:09 PM > Subject: unknown af_family recieved errors... > > Hi, I'm having odd issues with my "cluster", I guess. This very simple > example works on one machine, but it gives a load of errors and hangs > afterwards when I try to make it work on parrallelize it across the network. > > #include <stdio.h> > #include "mpi.h" > > int > main(int argc, char *argv[]) > { > int rank, size; > MPI_Init(&argc, &argv); > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > MPI_Comm_size(MPI_COMM_WORLD, &size); > > if (rank == 0) > { > int i; > for(i=1; i < size; ++i) > { > int s=1; > MPI_Send(&s, 1, MPI_INT, i, 1, MPI_COMM_WORLD); > } > } > else > { > int r; > MPI_Recv(&r, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, NULL); > printf("%d got a %d\n", rank, r); > } > MPI_Finalize(); > return 0; > } > > If I do `mpirun -np 3 a.out', where a.out is the executable, I get obvious > output: > > 1 got a 1 > 2 got a 1 > > Now, let's say I go on the network. I use `mpirun --hostfile ../combin_host > a.out', where my hostfile is simply: > > # Hostfile > angryrock@192.168.0.1 slots=4 > # Hostfile > user@192.168.0.102 slots=2 > user@192.168.0.103 slots=2 > user@192.168.0.104 slots=2 > user@192.168.0.105 slots=2 > > I get this... > > [localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1 > [localhost:04756] unknown address family for tcp: 0 > [localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1 > [localhost:04756] unknown address family for tcp: 0 > [localhost:04610] mca_btl_tcp_proc: unknown af_family received: 1 > [localhost:04610] unknown address family for tcp: 0 > [localhost:04048] mca_btl_tcp_proc: unknown af_family received: 1 > ... > [localhost:04123] unknown address family for tcp: 0 > 1 got a 1 > 2 got a 1 > 3 got a 1 > ^Cmpirun: killing job... > > The ellipsis encompases a few lines of the same thing probably for each host. > The ending part no doubt is a.out executing on my machine. As is obvious, at > the end, I have to kill it because it hangs. > > Any help as to what my issue might be? It obviously is an installation > issue... > > Thanks, > noobermin > > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/