----- Forwarded Message -----
From: Hamilton Fischer <fischerhamil...@yahoo.com>
To: "u...@open-mpi.org" <u...@open-mpi.org>
Sent: Monday, January 16, 2012 9:09 PM
Subject: unknown af_family recieved errors...
Hi, I'm having odd issues with my "cluster", I guess. This very simple example
works on one machine, but it gives a load of errors and hangs afterwards when I
try to make it work on parrallelize it across the network.
#include <stdio.h>
#include "mpi.h"
int
main(int argc, char *argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (rank == 0)
{
int i;
for(i=1; i < size; ++i)
{
int s=1;
MPI_Send(&s, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
}
}
else
{
int r;
MPI_Recv(&r, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, NULL);
printf("%d got a %d\n", rank, r);
}
MPI_Finalize();
return 0;
}
If I do `mpirun -np 3 a.out', where a.out is the executable, I get obvious
output:
1 got a 1
2 got a 1
Now, let's say I go on the network. I use `mpirun --hostfile ../combin_host
a.out', where my hostfile is simply:
# Hostfile
angryrock@192.168.0.1 slots=4
# Hostfile
user@192.168.0.102 slots=2
user@192.168.0.103 slots=2
user@192.168.0.104 slots=2
user@192.168.0.105 slots=2
I get this...
[localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1
[localhost:04756] unknown address family for tcp: 0
[localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1
[localhost:04756] unknown address family for tcp: 0
[localhost:04610] mca_btl_tcp_proc: unknown af_family received: 1
[localhost:04610] unknown address family for tcp: 0
[localhost:04048] mca_btl_tcp_proc: unknown af_family received: 1
...
[localhost:04123] unknown address family for tcp: 0
1 got a 1
2 got a 1
3 got a 1
^Cmpirun: killing job...
The ellipsis encompases a few lines of the same thing probably for each host.
The ending part no doubt is a.out executing on my machine. As is obvious, at
the end, I have to kill it because it hangs.
Any help as to what my issue might be? It obviously is an installation issue...
Thanks,
noobermin