An update: Building the latest from master did not make any difference; the code still hangs with identical stack trace as before.
This should be a simple case to reproduce (positively or negatively). Would somebody in the developer community mind giving it a quick try? Thank you Durga 1% of the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Mon, Apr 18, 2016 at 12:06 AM, dpchoudh . <dpcho...@gmail.com> wrote: > Thank you for your suggestion, Ralph. But it did not make any difference. > > Let me say that my code is about a week stale. I just did a git pull and > am building it right now. The build takes quite a bit of time, so I avoid > doing that unless there is a reason. But what I am trying out is the most > basic functionality, so I'd think a week or so of lag would not make a > difference. > > Does the stack trace suggest something to you? It seems that the send > hangs; but a 4 byte send should be sent eagerly. > > Best regards > 'Durga > > 1% of the executables have 99% of CPU privilege! > Userspace code! Unite!! Occupy the kernel!!! > > On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if that >> makes a difference >> >> On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com> wrote: >> >> Hello Gilles and all >> >> I am sorry to be bugging the developers, but this issue seems to be >> nagging me, and I am surprised it does not seem to affect anybody else. But >> then again, I am using the master branch, and most users are probably using >> a released version. >> >> This time I am using a totally different cluster. This has NO verbs >> capable interface; just 2 Ethernet (1 of which has no IP address and hence >> is unusable) plus 1 proprietary interface that currently supports only IP >> traffic. The two IP interfaces (Ethernet and proprietary) are on different >> IP subnets. >> >> My test program is as follows: >> >> #include <stdio.h> >> #include <string.h> >> #include "mpi.h" >> int main(int argc, char *argv[]) >> { >> char host[128]; >> int n; >> MPI_Init(&argc, &argv); >> MPI_Get_processor_name(host, &n); >> printf("Hello from %s\n", host); >> MPI_Comm_size(MPI_COMM_WORLD, &n); >> printf("The world has %d nodes\n", n); >> MPI_Comm_rank(MPI_COMM_WORLD, &n); >> printf("My rank is %d\n",n); >> //#if 0 >> if (n == 0) >> { >> strcpy(host, "ha!"); >> MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD); >> printf("sent %s\n", host); >> } >> else >> { >> //int len = strlen(host) + 1; >> bzero(host, 128); >> MPI_Recv(host, 4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); >> printf("Received %s from rank 0\n", host); >> } >> //#endif >> MPI_Finalize(); >> return 0; >> } >> >> This program, when run between two nodes, hangs. The command was: >> [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca >> pml ob1 -mca btl_tcp_if_include eno1 ./mpitest >> >> And the hang is with the following output: (eno1 is one of the gigEth >> interfaces, that takes OOB traffic as well) >> >> Hello from b-1 >> The world has 2 nodes >> My rank is 0 >> Hello from b-2 >> The world has 2 nodes >> My rank is 1 >> >> Note that if I uncomment the #if 0 - #endif (i.e. comment out the >> MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that >> the printfs following MPI_Send()/MPI_Recv() do not show up on console. >> >> Upon attaching gdb, the stack trace from the master node is as follows: >> >> Missing separate debuginfos, use: debuginfo-install >> glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 >> (gdb) bt >> #0 0x00007f72a533eb7d in poll () from /lib64/libc.so.6 >> #1 0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, tv=0x7fff81057b70) >> at poll.c:165 >> #2 0x00007f72a4caede0 in opal_libevent2022_event_base_loop >> (base=0xee33d0, >> flags=2) at event.c:1630 >> #3 0x00007f72a4c4e692 in opal_progress () at runtime/opal_progress.c:171 >> #4 0x00007f72a0d07ac1 in opal_condition_wait ( >> c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80 >> <ompi_request_lock>) >> at ../../../../opal/threads/condition.h:76 >> #5 0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80) >> at ../../../../ompi/request/request.h:383 >> #6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4, >> datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, >> sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280 >> <ompi_mpi_comm_world>) >> at pml_ob1_isend.c:251 >> #7 0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4, >> type=0x601080 <ompi_mpi_char>, dest=1, tag=1, >> comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78 >> #8 0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at >> mpitest.c:19 >> (gdb) >> >> And the backtrace on the non-master node is: >> >> (gdb) bt >> #0 0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6 >> #1 0x00007ff3b37af014 in usleep () from /lib64/libc.so.6 >> #2 0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, >> nprocs=0, >> info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100 >> #3 0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0) >> at pmix120_client.c:258 >> #4 0x00007ff3b3cf8f4b in ompi_mpi_finalize () >> at runtime/ompi_mpi_finalize.c:242 >> #5 0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47 >> #6 0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at >> mpitest.c:30 >> (gdb) >> >> The hostfile is as follows: >> >> [durga@b-1 ~]$ cat hostfile >> 10.4.70.10 slots=1 >> 10.4.70.11 slots=1 >> #10.4.70.12 slots=1 >> >> And the ifconfig output from the master node is as follows (the other >> node is similar; all the IP interfaces are in their respective subnets) : >> >> [durga@b-1 ~]$ ifconfig >> eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 >> inet 10.4.70.10 netmask 255.255.255.0 broadcast 10.4.70.255 >> inet6 fe80::21e:c9ff:fefe:13df prefixlen 64 scopeid 0x20<link> >> ether 00:1e:c9:fe:13:df txqueuelen 1000 (Ethernet) >> RX packets 48215 bytes 27842846 (26.5 MiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 52746 bytes 7817568 (7.4 MiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> device interrupt 16 >> >> eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 >> ether 00:1e:c9:fe:13:e0 txqueuelen 1000 (Ethernet) >> RX packets 0 bytes 0 (0.0 B) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 0 bytes 0 (0.0 B) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> device interrupt 17 >> >> lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016 >> inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 >> inet6 fe80::3002:ff:fe33:3333 prefixlen 64 scopeid 0x20<link> >> ether 32:02:00:33:33:33 txqueuelen 1000 (Ethernet) >> RX packets 10 bytes 512 (512.0 B) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 22 bytes 1536 (1.5 KiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 >> inet 127.0.0.1 netmask 255.0.0.0 >> inet6 ::1 prefixlen 128 scopeid 0x10<host> >> loop txqueuelen 0 (Local Loopback) >> RX packets 26 bytes 1378 (1.3 KiB) >> RX errors 0 dropped 0 overruns 0 frame 0 >> TX packets 26 bytes 1378 (1.3 KiB) >> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >> >> Please help me with this. I am stuck with the TCP transport, which is the >> most basic of all transports. >> >> Thanks in advance >> Durga >> >> >> 1% of the executables have 99% of CPU privilege! >> Userspace code! Unite!! Occupy the kernel!!! >> >> On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp> >> wrote: >> >>> This is quite unlikely, and fwiw, your test program works for me. >>> >>> i suggest you check your 3 TCP networks are usable, for example >>> >>> $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 --mca >>> btl_tcp_if_include xxx ./mpitest >>> >>> in which xxx is a [list of] interface name : >>> eth0 >>> eth1 >>> ib0 >>> eth0,eth1 >>> eth0,ib0 >>> ... >>> eth0,eth1,ib0 >>> >>> and see where problem start occuring. >>> >>> btw, are your 3 interfaces in 3 different subnet ? is routing required >>> between two interfaces of the same type ? >>> >>> Cheers, >>> >>> Gilles >>> >>> On 4/13/2016 7:15 AM, dpchoudh . wrote: >>> >>> Hi all >>> >>> I have reported this issue before, but then had brushed it off as >>> something that was caused by my modifications to the source tree. It looks >>> like that is not the case. >>> >>> Just now, I did the following: >>> >>> 1. Cloned a fresh copy from master. >>> 2. Configured with the following flags, built and installed it in my >>> two-node "cluster". >>> --enable-debug --enable-debug-symbols --disable-dlopen >>> 3. Compiled the following program, mpitest.c with these flags: -g3 -Wall >>> -Wextra >>> 4. Ran it like this: >>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp >>> -mca pml ob1 ./mpitest >>> >>> With this, the code hangs at MPI_Barrier() on both nodes, after >>> generating the following output: >>> >>> Hello world from processor smallMPI, rank 0 out of 2 processors >>> Hello world from processor bigMPI, rank 1 out of 2 processors >>> smallMPI sent haha! >>> bigMPI received haha! >>> <Hangs until killed by ^C> >>> Attaching to the hung process at one node gives the following backtrace: >>> >>> (gdb) bt >>> #0 0x00007f55b0f41c3d in poll () from /lib64/libc.so.6 >>> #1 0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0, >>> tv=0x7ffd1bb551c0) at poll.c:165 >>> #2 0x00007f55b03c4a90 in opal_libevent2022_event_base_loop >>> (base=0x70e7b0, flags=2) at event.c:1630 >>> #3 0x00007f55b02f0144 in opal_progress () at runtime/opal_progress.c:171 >>> #4 0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40 >>> <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at >>> ../opal/threads/condition.h:76 >>> #5 0x00007f55b14b531b in ompi_request_default_wait_all (count=2, >>> requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at request/req_wait.c:287 >>> #6 0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1, >>> stag=-16, source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>) >>> at base/coll_base_barrier.c:63 >>> #7 0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs >>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >>> base/coll_base_barrier.c:308 >>> #8 0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed >>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >>> coll_tuned_decision_fixed.c:196 >>> #9 0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280 >>> <ompi_mpi_comm_world>) at pbarrier.c:63 >>> #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at >>> mpitest.c:26 >>> (gdb) >>> >>> Thinking that this might be a bug in tuned collectives, since that is >>> what the stack shows, I ran the program like this (basically adding the >>> ^tuned part) >>> >>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp >>> -mca pml ob1 -mca coll ^tuned ./mpitest >>> >>> It still hangs, but now with a different stack trace: >>> (gdb) bt >>> #0 0x00007f910d38ac3d in poll () from /lib64/libc.so.6 >>> #1 0x00007f910c815de6 in poll_dispatch (base=0x1a317b0, >>> tv=0x7fff43ee3610) at poll.c:165 >>> #2 0x00007f910c80da90 in opal_libevent2022_event_base_loop >>> (base=0x1a317b0, flags=2) at event.c:1630 >>> #3 0x00007f910c739144 in opal_progress () at runtime/opal_progress.c:171 >>> #4 0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40 >>> <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>) >>> at ../../../../opal/threads/condition.h:76 >>> #5 0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680) >>> at ../../../../ompi/request/request.h:383 >>> #6 0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0, >>> datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16, >>> sendmode=MCA_PML_BASE_SEND_STANDARD, >>> comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259 >>> #7 0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear >>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at >>> base/coll_base_barrier.c:368 >>> #8 0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280 >>> <ompi_mpi_comm_world>) at pbarrier.c:63 >>> #9 0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at >>> mpitest.c:26 >>> (gdb) >>> >>> The mpitest.c program is as follows: >>> #include <mpi.h> >>> #include <stdio.h> >>> #include <string.h> >>> >>> int main(int argc, char** argv) >>> { >>> int world_size, world_rank, name_len; >>> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; >>> >>> MPI_Init(&argc, &argv); >>> MPI_Comm_size(MPI_COMM_WORLD, &world_size); >>> MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); >>> MPI_Get_processor_name(hostname, &name_len); >>> printf("Hello world from processor %s, rank %d out of %d >>> processors\n", hostname, world_rank, world_size); >>> if (world_rank == 1) >>> { >>> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE); >>> printf("%s received %s\n", hostname, buf); >>> } >>> else >>> { >>> strcpy(buf, "haha!"); >>> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); >>> printf("%s sent %s\n", hostname, buf); >>> } >>> MPI_Barrier(MPI_COMM_WORLD); >>> MPI_Finalize(); >>> return 0; >>> } >>> >>> The hostfile is as follows: >>> 10.10.10.10 slots=1 >>> 10.10.10.11 slots=1 >>> >>> The two nodes are connected by three physical and 3 logical networks: >>> Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband >>> Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and >>> Infiniband) >>> >>> Please note again that this is a fresh, brand new clone. >>> >>> Is this a bug (perhaps a side effect of --disable-dlopen) or something I >>> am doing wrong? >>> >>> Thanks >>> Durga >>> >>> We learn from history that we never learn from history. >>> >>> >>> _______________________________________________ >>> users mailing listus...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/04/28930.php >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/04/28932.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28942.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/04/28943.php >> > >