Hello Gilles and all I am sorry to be bugging the developers, but this issue seems to be nagging me, and I am surprised it does not seem to affect anybody else. But then again, I am using the master branch, and most users are probably using a released version.
This time I am using a totally different cluster. This has NO verbs capable interface; just 2 Ethernet (1 of which has no IP address and hence is unusable) plus 1 proprietary interface that currently supports only IP traffic. The two IP interfaces (Ethernet and proprietary) are on different IP subnets. My test program is as follows: #include <stdio.h> #include <string.h> #include "mpi.h" int main(int argc, char *argv[]) { char host[128]; int n; MPI_Init(&argc, &argv); MPI_Get_processor_name(host, &n); printf("Hello from %s\n", host); MPI_Comm_size(MPI_COMM_WORLD, &n); printf("The world has %d nodes\n", n); MPI_Comm_rank(MPI_COMM_WORLD, &n); printf("My rank is %d\n",n); //#if 0 if (n == 0) { strcpy(host, "ha!"); MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD); printf("sent %s\n", host); } else { //int len = strlen(host) + 1; bzero(host, 128); MPI_Recv(host, 4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); printf("Received %s from rank 0\n", host); } //#endif MPI_Finalize(); return 0; } This program, when run between two nodes, hangs. The command was: [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 -mca btl_tcp_if_include eno1 ./mpitest And the hang is with the following output: (eno1 is one of the gigEth interfaces, that takes OOB traffic as well) Hello from b-1 The world has 2 nodes My rank is 0 Hello from b-2 The world has 2 nodes My rank is 1 Note that if I uncomment the #if 0 - #endif (i.e. comment out the MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that the printfs following MPI_Send()/MPI_Recv() do not show up on console. Upon attaching gdb, the stack trace from the master node is as follows: Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 (gdb) bt #0 0x00007f72a533eb7d in poll () from /lib64/libc.so.6 #1 0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, tv=0x7fff81057b70) at poll.c:165 #2 0x00007f72a4caede0 in opal_libevent2022_event_base_loop (base=0xee33d0, flags=2) at event.c:1630 #3 0x00007f72a4c4e692 in opal_progress () at runtime/opal_progress.c:171 #4 0x00007f72a0d07ac1 in opal_condition_wait ( c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80 <ompi_request_lock>) at ../../../../opal/threads/condition.h:76 #5 0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80) at ../../../../ompi/request/request.h:383 #6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4, datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:251 #7 0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4, type=0x601080 <ompi_mpi_char>, dest=1, tag=1, comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78 #8 0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at mpitest.c:19 (gdb) And the backtrace on the non-master node is: (gdb) bt #0 0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6 #1 0x00007ff3b37af014 in usleep () from /lib64/libc.so.6 #2 0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, nprocs=0, info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100 #3 0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0) at pmix120_client.c:258 #4 0x00007ff3b3cf8f4b in ompi_mpi_finalize () at runtime/ompi_mpi_finalize.c:242 #5 0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47 #6 0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at mpitest.c:30 (gdb) The hostfile is as follows: [durga@b-1 ~]$ cat hostfile 10.4.70.10 slots=1 10.4.70.11 slots=1 #10.4.70.12 slots=1 And the ifconfig output from the master node is as follows (the other node is similar; all the IP interfaces are in their respective subnets) : [durga@b-1 ~]$ ifconfig eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.4.70.10 netmask 255.255.255.0 broadcast 10.4.70.255 inet6 fe80::21e:c9ff:fefe:13df prefixlen 64 scopeid 0x20<link> ether 00:1e:c9:fe:13:df txqueuelen 1000 (Ethernet) RX packets 48215 bytes 27842846 (26.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 52746 bytes 7817568 (7.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 16 eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 00:1e:c9:fe:13:e0 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 17 lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016 inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::3002:ff:fe33:3333 prefixlen 64 scopeid 0x20<link> ether 32:02:00:33:33:33 txqueuelen 1000 (Ethernet) RX packets 10 bytes 512 (512.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 22 bytes 1536 (1.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 0 (Local Loopback) RX packets 26 bytes 1378 (1.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 26 bytes 1378 (1.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Please help me with this. I am stuck with the TCP transport, which is the most basic of all transports. Thanks in advance Durga 1% of the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > This is quite unlikely, and fwiw, your test program works for me. > > i suggest you check your 3 TCP networks are usable, for example > > $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 --mca > btl_tcp_if_include xxx ./mpitest > > in which xxx is a [list of] interface name : > eth0 > eth1 > ib0 > eth0,eth1 > eth0,ib0 > ... > eth0,eth1,ib0 > > and see where problem start occuring. > > btw, are your 3 interfaces in 3 different subnet ? is routing required > between two interfaces of the same type ? > > Cheers, > > Gilles > > On 4/13/2016 7:15 AM, dpchoudh . wrote: > > Hi all > > I have reported this issue before, but then had brushed it off as > something that was caused by my modifications to the source tree. It looks > like that is not the case. > > Just now, I did the following: > > 1. Cloned a fresh copy from master. > 2. Configured with the following flags, built and installed it in my > two-node "cluster". > --enable-debug --enable-debug-symbols --disable-dlopen > 3. Compiled the following program, mpitest.c with these flags: -g3 -Wall > -Wextra > 4. Ran it like this: > [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp > -mca pml ob1 ./mpitest > > With this, the code hangs at MPI_Barrier() on both nodes, after generating > the following output: > > Hello world from processor smallMPI, rank 0 out of 2 processors > Hello world from processor bigMPI, rank 1 out of 2 processors > smallMPI sent haha! > bigMPI received haha! > <Hangs until killed by ^C> > Attaching to the hung process at one node gives the following backtrace: > > (gdb) bt > #0 0x00007f55b0f41c3d in poll () from /lib64/libc.so.6 > #1 0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0, tv=0x7ffd1bb551c0) > at poll.c:165 > #2 0x00007f55b03c4a90 in opal_libevent2022_event_base_loop > (base=0x70e7b0, flags=2) at event.c:1630 > #3 0x00007f55b02f0144 in opal_progress () at runtime/opal_progress.c:171 > #4 0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40 > <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at > ../opal/threads/condition.h:76 > #5 0x00007f55b14b531b in ompi_request_default_wait_all (count=2, > requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at request/req_wait.c:287 > #6 0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1, stag=-16, > source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>) > at base/coll_base_barrier.c:63 > #7 0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs > (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at > base/coll_base_barrier.c:308 > #8 0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed > (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at > coll_tuned_decision_fixed.c:196 > #9 0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280 > <ompi_mpi_comm_world>) at pbarrier.c:63 > #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at > mpitest.c:26 > (gdb) > > Thinking that this might be a bug in tuned collectives, since that is what > the stack shows, I ran the program like this (basically adding the ^tuned > part) > > [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp > -mca pml ob1 -mca coll ^tuned ./mpitest > > It still hangs, but now with a different stack trace: > (gdb) bt > #0 0x00007f910d38ac3d in poll () from /lib64/libc.so.6 > #1 0x00007f910c815de6 in poll_dispatch (base=0x1a317b0, > tv=0x7fff43ee3610) at poll.c:165 > #2 0x00007f910c80da90 in opal_libevent2022_event_base_loop > (base=0x1a317b0, flags=2) at event.c:1630 > #3 0x00007f910c739144 in opal_progress () at runtime/opal_progress.c:171 > #4 0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40 > <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>) > at ../../../../opal/threads/condition.h:76 > #5 0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680) at > ../../../../ompi/request/request.h:383 > #6 0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0, > datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16, > sendmode=MCA_PML_BASE_SEND_STANDARD, > comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259 > #7 0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear > (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at > base/coll_base_barrier.c:368 > #8 0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280 > <ompi_mpi_comm_world>) at pbarrier.c:63 > #9 0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at > mpitest.c:26 > (gdb) > > The mpitest.c program is as follows: > #include <mpi.h> > #include <stdio.h> > #include <string.h> > > int main(int argc, char** argv) > { > int world_size, world_rank, name_len; > char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; > > MPI_Init(&argc, &argv); > MPI_Comm_size(MPI_COMM_WORLD, &world_size); > MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); > MPI_Get_processor_name(hostname, &name_len); > printf("Hello world from processor %s, rank %d out of %d > processors\n", hostname, world_rank, world_size); > if (world_rank == 1) > { > MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE); > printf("%s received %s\n", hostname, buf); > } > else > { > strcpy(buf, "haha!"); > MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); > printf("%s sent %s\n", hostname, buf); > } > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > return 0; > } > > The hostfile is as follows: > 10.10.10.10 slots=1 > 10.10.10.11 slots=1 > > The two nodes are connected by three physical and 3 logical networks: > Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband > Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and Infiniband) > > Please note again that this is a fresh, brand new clone. > > Is this a bug (perhaps a side effect of --disable-dlopen) or something I > am doing wrong? > > Thanks > Durga > > We learn from history that we never learn from history. > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28930.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28932.php >