your program works fine on my environment.
this is typical of a firewall running on your host(s), can you double
check that ?
a simple way to do that is to
10.10.10.11# nc -l 1024
and on the other node
echo ahah | nc 10.10.10.11 1024
the first command should print "ahah" unless the host is unreachable
and/or the tcp connection is denied by the firewall.
Cheers,
Gilles
On 4/4/2016 9:44 AM, dpchoudh . wrote:
Hello Gilles
Thanks for your help.
My question was more of a sanity check on myself. That little program
I sent looked correct to me; do you see anything wrong with it?
What I am running on my setup is an instrumented OMPI stack, taken
from git HEAD, in an attempt to understand how some of the internals
work. If you think the code is correct, it is quite possible that one
of those 'instrumentations' is causing this.
And BTW, adding -mca pml ob1 makes the code hang at MPI_Send (as
opposed to MPI_Recv())
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on
node 10.10.10.11
[smallMPI:51673] mca: bml: Using tcp btl for send to [[51894,1],1] on
node 10.10.10.11
[smallMPI:51673] btl: tcp: attempting to connect() to [[51894,1],1]
address 10.10.10.11 on port 1024 <--- Hangs here
But 10.10.10.11 is pingable:
[durga@smallMPI ~]$ ping bigMPI
PING bigMPI (10.10.10.11) 56(84) bytes of data.
64 bytes from bigMPI (10.10.10.11): icmp_seq=1 ttl=64 time=0.247 ms
We learn from history that we never learn from history.
On Sun, Apr 3, 2016 at 8:04 PM, Gilles Gouaillardet <gil...@rist.or.jp
<mailto:gil...@rist.or.jp>> wrote:
Hi,
per a previous message, can you give a try to
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp --mca pml ob1
./mpitest
if it still hangs, the issue could be OpenMPI think some subnets
are reachable but they are not.
for diagnostic :
mpirun --mca btl_base_verbose 100 ...
you can explicitly include/exclude subnets with
--mca btl_tcp_if_include xxx
or
--mca btl_tcp_if_exclude yyy
for example,
mpirun --mca btl_btp_if_include 192.168.0.0/24
<http://192.168.0.0/24> -np 2 -hostfile ~/hostfile --mca btl
self,tcp --mca pml ob1 ./mpitest
should do the trick
Cheers,
Gilles
On 4/4/2016 8:32 AM, dpchoudh . wrote:
Hello all
I don't mean to be competing for the 'silliest question of the
year award', but I can't figure this out on my own:
My 'cluster' has 2 machines, bigMPI and smallMPI. They are
connected via several (types of) networks and the connectivity is OK.
In this setup, the following program hangs after printing
Hello world from processor smallMPI, rank 0 out of 2 processors
Hello world from processor bigMPI, rank 1 out of 2 processors
smallMPI sent haha!
Obviously it is hanging at MPI_Recv(). But why? My command line
is as follows, but this happens if I try openib BTL (instead of
TCP) as well.
mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp ./mpitest
It must be something *really* trivial, but I am drawing a blank
right now.
Please help!
#include <mpi.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
int world_size, world_rank, name_len;
char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
MPI_Get_processor_name(hostname, &name_len);
printf("Hello world from processor %s, rank %d out of %d
processors\n", hostname, world_rank, world_size);
if (world_rank == 1)
{
MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("%s received %s\n", hostname, buf);
}
else
{
strcpy(buf, "haha!");
MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
printf("%s sent %s\n", hostname, buf);
}
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
We learn from history that we never learn from history.
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/04/28876.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28877.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/04/28878.php