Thank you for your suggestion, Ralph. But it did not make any difference. Let me say that my code is about a week stale. I just did a git pull and am building it right now. The build takes quite a bit of time, so I avoid doing that unless there is a reason. But what I am trying out is the most basic functionality, so I'd think a week or so of lag would not make a difference.
Does the stack trace suggest something to you? It seems that the send hangs; but a 4 byte send should be sent eagerly. Best regards 'Durga 1% of the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org> wrote: > Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if that > makes a difference > > On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com> wrote: > > Hello Gilles and all > > I am sorry to be bugging the developers, but this issue seems to be > nagging me, and I am surprised it does not seem to affect anybody else. But > then again, I am using the master branch, and most users are probably using > a released version. > > This time I am using a totally different cluster. This has NO verbs > capable interface; just 2 Ethernet (1 of which has no IP address and hence > is unusable) plus 1 proprietary interface that currently supports only IP > traffic. The two IP interfaces (Ethernet and proprietary) are on different > IP subnets. > > My test program is as follows: > > #include <stdio.h> > #include <string.h> > #include "mpi.h" > int main(int argc, char *argv[]) > { > char host[128]; > int n; > MPI_Init(&argc, &argv); > MPI_Get_processor_name(host, &n); > printf("Hello from %s\n", host); > MPI_Comm_size(MPI_COMM_WORLD, &n); > printf("The world has %d nodes\n", n); > MPI_Comm_rank(MPI_COMM_WORLD, &n); > printf("My rank is %d\n",n); > //#if 0 > if (n == 0) > { > strcpy(host, "ha!"); > MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD); > printf("sent %s\n", host); > } > else > { > //int len = strlen(host) + 1; > bzero(host, 128); > MPI_Recv(host, 4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); > printf("Received %s from rank 0\n", host); > } > //#endif > MPI_Finalize(); > return 0; > } > > This program, when run between two nodes, hangs. The command was: > [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca > pml ob1 -mca btl_tcp_if_include eno1 ./mpitest > > And the hang is with the following output: (eno1 is one of the gigEth > interfaces, that takes OOB traffic as well) > > Hello from b-1 > The world has 2 nodes > My rank is 0 > Hello from b-2 > The world has 2 nodes > My rank is 1 > > Note that if I uncomment the #if 0 - #endif (i.e. comment out the > MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that > the printfs following MPI_Send()/MPI_Recv() do not show up on console. > > Upon attaching gdb, the stack trace from the master node is as follows: > > Missing separate debuginfos, use: debuginfo-install > glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 > (gdb) bt > #0 0x00007f72a533eb7d in poll () from /lib64/libc.so.6 > #1 0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, tv=0x7fff81057b70) > at poll.c:165 > #2 0x00007f72a4caede0 in opal_libevent2022_event_base_loop (base=0xee33d0, > flags=2) at event.c:1630 > #3 0x00007f72a4c4e692 in opal_progress () at runtime/opal_progress.c:171 > #4 0x00007f72a0d07ac1 in opal_condition_wait ( > c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80 > <ompi_request_lock>) > at ../../../../opal/threads/condition.h:76 > #5 0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80) > at ../../../../ompi/request/request.h:383 > #6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4, > datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, > sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280 > <ompi_mpi_comm_world>) > at pml_ob1_isend.c:251 > #7 0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4, > type=0x601080 <ompi_mpi_char>, dest=1, tag=1, > comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78 > #8 0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at > mpitest.c:19 > (gdb) > > And the backtrace on the non-master node is: > > (gdb) bt > #0 0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6 > #1 0x00007ff3b37af014 in usleep () from /lib64/libc.so.6 > #2 0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, > nprocs=0, > info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100 > #3 0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0) > at pmix120_client.c:258 > #4 0x00007ff3b3cf8f4b in ompi_mpi_finalize () > at runtime/ompi_mpi_finalize.c:242 > #5 0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47 > #6 0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at > mpitest.c:30 > (gdb) > > The hostfile is as follows: > > [durga@b-1 ~]$ cat hostfile > 10.4.70.10 slots=1 > 10.4.70.11 slots=1 > #10.4.70.12 slots=1 > > And the ifconfig output from the master node is as follows (the other node > is similar; all the IP interfaces are in their respective subnets) : > > [durga@b-1 ~]$ ifconfig > eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 > inet 10.4.70.10 netmask 255.255.255.0 broadcast 10.4.70.255 > inet6 fe80::21e:c9ff:fefe:13df prefixlen 64 scopeid 0x20<link> > ether 00:1e:c9:fe:13:df txqueuelen 1000 (Ethernet) > RX packets 48215 bytes 27842846 (26.5 MiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 52746 bytes 7817568 (7.4 MiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > device interrupt 16 > > eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 > ether 00:1e:c9:fe:13:e0 txqueuelen 1000 (Ethernet) > RX packets 0 bytes 0 (0.0 B) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 0 bytes 0 (0.0 B) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > device interrupt 17 > > lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016 > inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 > inet6 fe80::3002:ff:fe33:3333 prefixlen 64 scopeid 0x20<link> > ether 32:02:00:33:33:33 txqueuelen 1000 (Ethernet) > RX packets 10 bytes 512 (512.0 B) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 22 bytes 1536 (1.5 KiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 > inet 127.0.0.1 netmask 255.0.0.0 > inet6 ::1 prefixlen 128 scopeid 0x10<host> > loop txqueuelen 0 (Local Loopback) > RX packets 26 bytes 1378 (1.3 KiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 26 bytes 1378 (1.3 KiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 > > Please help me with this. I am stuck with the TCP transport, which is the > most basic of all transports. > > Thanks in advance > Durga > > > 1% of the executables have 99% of CPU privilege! > Userspace code! Unite!! Occupy the kernel!!! > > On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> This is quite unlikely, and fwiw, your test program works for me. >> >> i suggest you check your 3 TCP networks are usable, for example >> >> $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 --mca >> btl_tcp_if_include xxx ./mpitest >> >> in which xxx is a [list of] interface name : >> eth0 >> eth1 >> ib0 >> eth0,eth1 >> eth0,ib0 >> ... >> eth0,eth1,ib0 >> >> and see where problem start occuring. >> >> btw, are your 3 interfaces in 3 different subnet ? is routing required >> between two interfaces of the same type ? >> >> Cheers, >> >> Gilles >> >> On 4/13/2016 7:15 AM, dpchoudh . wrote: >> >> Hi all >> >> I have reported this issue before, but then had brushed it off as >> something that was caused by my modifications to the source tree. It looks >> like that is not the case. >> >> Just now, I did the following: >> >> 1. Cloned a fresh copy from master. >> 2. Configured with the following flags, built and installed it in my >> two-node "cluster". >> --enable-debug --enable-debug-symbols --disable-dlopen >> 3. Compiled the following program, mpitest.c with these flags: -g3 -Wall >> -Wextra >> 4. Ran it like this: >> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp >> -mca pml ob1 ./mpitest >> >> With this, the code hangs at MPI_Barrier() on both nodes, after >> generating the following output: >> >> Hello world from processor smallMPI, rank 0 out of 2 processors >> Hello world from processor bigMPI, rank 1 out of 2 processors >> smallMPI sent haha! >> bigMPI received haha! >> <Hangs until killed by ^C> >> Attaching to the hung process at one node gives the following backtrace: >> >> (gdb) bt >> #0 0x00007f55b0f41c3d in poll () from /lib64/libc.so.6 >> #1 0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0, >> tv=0x7ffd1bb551c0) at poll.c:165 >> #2 0x00007f55b03c4a90 in opal_libevent2022_event_base_loop >> (base=0x70e7b0, flags=2) at event.c:1630 >> #3 0x00007f55b02f0144 in opal_progress () at runtime/opal_progress.c:171 >> #4 0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40 >> <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at >> ../opal/threads/condition.h:76 >> #5 0x00007f55b14b531b in ompi_request_default_wait_all (count=2, >> requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at request/req_wait.c:287 >> #6 0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1, stag=-16, >> source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>) >> at base/coll_base_barrier.c:63 >> #7 0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs >> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >> base/coll_base_barrier.c:308 >> #8 0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed >> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >> coll_tuned_decision_fixed.c:196 >> #9 0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280 >> <ompi_mpi_comm_world>) at pbarrier.c:63 >> #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at >> mpitest.c:26 >> (gdb) >> >> Thinking that this might be a bug in tuned collectives, since that is >> what the stack shows, I ran the program like this (basically adding the >> ^tuned part) >> >> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp >> -mca pml ob1 -mca coll ^tuned ./mpitest >> >> It still hangs, but now with a different stack trace: >> (gdb) bt >> #0 0x00007f910d38ac3d in poll () from /lib64/libc.so.6 >> #1 0x00007f910c815de6 in poll_dispatch (base=0x1a317b0, >> tv=0x7fff43ee3610) at poll.c:165 >> #2 0x00007f910c80da90 in opal_libevent2022_event_base_loop >> (base=0x1a317b0, flags=2) at event.c:1630 >> #3 0x00007f910c739144 in opal_progress () at runtime/opal_progress.c:171 >> #4 0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40 >> <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>) >> at ../../../../opal/threads/condition.h:76 >> #5 0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680) at >> ../../../../ompi/request/request.h:383 >> #6 0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0, >> datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16, >> sendmode=MCA_PML_BASE_SEND_STANDARD, >> comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259 >> #7 0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear >> (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at >> base/coll_base_barrier.c:368 >> #8 0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280 >> <ompi_mpi_comm_world>) at pbarrier.c:63 >> #9 0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at >> mpitest.c:26 >> (gdb) >> >> The mpitest.c program is as follows: >> #include <mpi.h> >> #include <stdio.h> >> #include <string.h> >> >> int main(int argc, char** argv) >> { >> int world_size, world_rank, name_len; >> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &world_size); >> MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); >> MPI_Get_processor_name(hostname, &name_len); >> printf("Hello world from processor %s, rank %d out of %d >> processors\n", hostname, world_rank, world_size); >> if (world_rank == 1) >> { >> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE); >> printf("%s received %s\n", hostname, buf); >> } >> else >> { >> strcpy(buf, "haha!"); >> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); >> printf("%s sent %s\n", hostname, buf); >> } >> MPI_Barrier(MPI_COMM_WORLD); >> MPI_Finalize(); >> return 0; >> } >> >> The hostfile is as follows: >> 10.10.10.10 slots=1 >> 10.10.10.11 slots=1 >> >> The two nodes are connected by three physical and 3 logical networks: >> Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband >> Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and Infiniband) >> >> Please note again that this is a fresh, brand new clone. >> >> Is this a bug (perhaps a side effect of --disable-dlopen) or something I >> am doing wrong? >> >> Thanks >> Durga >> >> We learn from history that we never learn from history. >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28930.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28932.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28942.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28943.php >