Re: [OMPI users] Possible bug in MPI_Barrier() ?

dpchoudh . Mon, 18 Apr 2016 00:36:09 -0400 (EDT)

An update:

Building the latest from master did not make any difference; the code still
hangs with identical stack trace as before.


This should be a simple case to reproduce (positively or negatively). Would
somebody in the developer community mind giving it a quick try?

Thank you
Durga

1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!

On Mon, Apr 18, 2016 at 12:06 AM, dpchoudh . <dpcho...@gmail.com> wrote:

> Thank you for your suggestion, Ralph. But it did not make any difference.
>
> Let me say that my code is about a week stale. I just did a git pull and
> am building it right now. The build takes quite a bit of time, so I avoid
> doing that unless there is a reason. But what I am trying out is the most
> basic functionality, so I'd think a week or so of lag would not make a
> difference.
>
> Does the stack trace suggest something to you? It seems that the send
> hangs; but a 4 byte send should be sent eagerly.
>
> Best regards
> 'Durga
>
> 1% of the executables have 99% of CPU privilege!
> Userspace code! Unite!! Occupy the kernel!!!
>
> On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if that
>> makes a difference
>>
>> On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com> wrote:
>>
>> Hello Gilles and all
>>
>> I am sorry to be bugging the developers, but this issue seems to be
>> nagging me, and I am surprised it does not seem to affect anybody else. But
>> then again, I am using the master branch, and most users are probably using
>> a released version.
>>
>> This time I am using a totally different cluster. This has NO verbs
>> capable interface; just 2 Ethernet (1 of which has no IP address and hence
>> is unusable) plus 1 proprietary interface that currently supports only IP
>> traffic. The two IP interfaces (Ethernet and proprietary) are on different
>> IP subnets.
>>
>> My test program is as follows:
>>
>> #include <stdio.h>
>> #include <string.h>
>> #include "mpi.h"
>> int main(int argc, char *argv[])
>> {
>> char host[128];
>> int n;
>> MPI_Init(&argc, &argv);
>> MPI_Get_processor_name(host, &n);
>> printf("Hello from %s\n", host);
>> MPI_Comm_size(MPI_COMM_WORLD, &n);
>> printf("The world has %d nodes\n", n);
>> MPI_Comm_rank(MPI_COMM_WORLD, &n);
>> printf("My rank is %d\n",n);
>> //#if 0
>> if (n == 0)
>> {
>> strcpy(host, "ha!");
>> MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD);
>> printf("sent %s\n", host);
>> }
>> else
>> {
>> //int len = strlen(host) + 1;
>> bzero(host, 128);
>> MPI_Recv(host,  4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>> printf("Received %s from rank 0\n", host);
>> }
>> //#endif
>> MPI_Finalize();
>> return 0;
>> }
>>
>> This program, when run between two nodes, hangs. The command was:
>> [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca
>> pml ob1 -mca btl_tcp_if_include eno1 ./mpitest
>>
>> And the hang is with the following output: (eno1 is one of the gigEth
>> interfaces, that takes OOB traffic as well)
>>
>> Hello from b-1
>> The world has 2 nodes
>> My rank is 0
>> Hello from b-2
>> The world has 2 nodes
>> My rank is 1
>>
>> Note that if I uncomment the #if 0 - #endif (i.e. comment out the
>> MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that
>> the printfs following MPI_Send()/MPI_Recv() do not show up on console.
>>
>> Upon attaching gdb, the stack trace from the master node is as follows:
>>
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64
>> (gdb) bt
>> #0  0x00007f72a533eb7d in poll () from /lib64/libc.so.6
>> #1  0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, tv=0x7fff81057b70)
>>     at poll.c:165
>> #2  0x00007f72a4caede0 in opal_libevent2022_event_base_loop
>> (base=0xee33d0,
>>     flags=2) at event.c:1630
>> #3  0x00007f72a4c4e692 in opal_progress () at runtime/opal_progress.c:171
>> #4  0x00007f72a0d07ac1 in opal_condition_wait (
>>     c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80
>> <ompi_request_lock>)
>>     at ../../../../opal/threads/condition.h:76
>> #5  0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80)
>>     at ../../../../ompi/request/request.h:383
>> #6  0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4,
>>     datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
>>     sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
>> <ompi_mpi_comm_world>)
>>     at pml_ob1_isend.c:251
>> #7  0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4,
>>     type=0x601080 <ompi_mpi_char>, dest=1, tag=1,
>>     comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
>> #8  0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at
>> mpitest.c:19
>> (gdb)
>>
>> And the backtrace on the non-master node is:
>>
>> (gdb) bt
>> #0  0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6
>> #1  0x00007ff3b37af014 in usleep () from /lib64/libc.so.6
>> #2  0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0,
>> nprocs=0,
>>     info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100
>> #3  0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0)
>>     at pmix120_client.c:258
>> #4  0x00007ff3b3cf8f4b in ompi_mpi_finalize ()
>>     at runtime/ompi_mpi_finalize.c:242
>> #5  0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47
>> #6  0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at
>> mpitest.c:30
>> (gdb)
>>
>> The hostfile is as follows:
>>
>> [durga@b-1 ~]$ cat hostfile
>> 10.4.70.10 slots=1
>> 10.4.70.11 slots=1
>> #10.4.70.12 slots=1
>>
>> And the ifconfig output from the master node is as follows (the other
>> node is similar; all the IP interfaces are in their respective subnets) :
>>
>> [durga@b-1 ~]$ ifconfig
>> eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
>>         inet 10.4.70.10  netmask 255.255.255.0  broadcast 10.4.70.255
>>         inet6 fe80::21e:c9ff:fefe:13df  prefixlen 64  scopeid 0x20<link>
>>         ether 00:1e:c9:fe:13:df  txqueuelen 1000  (Ethernet)
>>         RX packets 48215  bytes 27842846 (26.5 MiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 52746  bytes 7817568 (7.4 MiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device interrupt 16
>>
>> eno2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
>>         ether 00:1e:c9:fe:13:e0  txqueuelen 1000  (Ethernet)
>>         RX packets 0  bytes 0 (0.0 B)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 0  bytes 0 (0.0 B)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>         device interrupt 17
>>
>> lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 2016
>>         inet 192.168.1.2  netmask 255.255.255.0  broadcast 192.168.1.255
>>         inet6 fe80::3002:ff:fe33:3333  prefixlen 64  scopeid 0x20<link>
>>         ether 32:02:00:33:33:33  txqueuelen 1000  (Ethernet)
>>         RX packets 10  bytes 512 (512.0 B)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 22  bytes 1536 (1.5 KiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
>>         inet 127.0.0.1  netmask 255.0.0.0
>>         inet6 ::1  prefixlen 128  scopeid 0x10<host>
>>         loop  txqueuelen 0  (Local Loopback)
>>         RX packets 26  bytes 1378 (1.3 KiB)
>>         RX errors 0  dropped 0  overruns 0  frame 0
>>         TX packets 26  bytes 1378 (1.3 KiB)
>>         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> Please help me with this. I am stuck with the TCP transport, which is the
>> most basic of all transports.
>>
>> Thanks in advance
>> Durga
>>
>>
>> 1% of the executables have 99% of CPU privilege!
>> Userspace code! Unite!! Occupy the kernel!!!
>>
>> On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp>
>> wrote:
>>
>>> This is quite unlikely, and fwiw, your test program works for me.
>>>
>>> i suggest you check your 3 TCP networks are usable, for example
>>>
>>> $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 --mca
>>> btl_tcp_if_include xxx ./mpitest
>>>
>>> in which xxx is a [list of] interface name :
>>> eth0
>>> eth1
>>> ib0
>>> eth0,eth1
>>> eth0,ib0
>>> ...
>>> eth0,eth1,ib0
>>>
>>> and see where problem start occuring.
>>>
>>> btw, are your 3 interfaces in 3 different subnet ? is routing required
>>> between two interfaces of the same type ?
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 4/13/2016 7:15 AM, dpchoudh . wrote:
>>>
>>> Hi all
>>>
>>> I have reported this issue before, but then had brushed it off as
>>> something that was caused by my modifications to the source tree. It looks
>>> like that is not the case.
>>>
>>> Just now, I did the following:
>>>
>>> 1. Cloned a fresh copy from master.
>>> 2. Configured with the following flags, built and installed it in my
>>> two-node "cluster".
>>> --enable-debug --enable-debug-symbols --disable-dlopen
>>> 3. Compiled the following program, mpitest.c with these flags: -g3 -Wall
>>> -Wextra
>>> 4. Ran it like this:
>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp
>>> -mca pml ob1 ./mpitest
>>>
>>> With this, the code hangs at MPI_Barrier() on both nodes, after
>>> generating the following output:
>>>
>>> Hello world from processor smallMPI, rank 0 out of 2 processors
>>> Hello world from processor bigMPI, rank 1 out of 2 processors
>>> smallMPI sent haha!
>>> bigMPI received haha!
>>> <Hangs until killed by ^C>
>>> Attaching to the hung process at one node gives the following backtrace:
>>>
>>> (gdb) bt
>>> #0  0x00007f55b0f41c3d in poll () from /lib64/libc.so.6
>>> #1  0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0,
>>> tv=0x7ffd1bb551c0) at poll.c:165
>>> #2  0x00007f55b03c4a90 in opal_libevent2022_event_base_loop
>>> (base=0x70e7b0, flags=2) at event.c:1630
>>> #3  0x00007f55b02f0144 in opal_progress () at runtime/opal_progress.c:171
>>> #4  0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40
>>> <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at
>>> ../opal/threads/condition.h:76
>>> #5  0x00007f55b14b531b in ompi_request_default_wait_all (count=2,
>>> requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at request/req_wait.c:287
>>> #6  0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1,
>>> stag=-16, source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>)
>>>     at base/coll_base_barrier.c:63
>>> #7  0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at
>>> base/coll_base_barrier.c:308
>>> #8  0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at
>>> coll_tuned_decision_fixed.c:196
>>> #9  0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280
>>> <ompi_mpi_comm_world>) at pbarrier.c:63
>>> #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at
>>> mpitest.c:26
>>> (gdb)
>>>
>>> Thinking that this might be a bug in tuned collectives, since that is
>>> what the stack shows, I ran the program like this (basically adding the
>>> ^tuned part)
>>>
>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp
>>> -mca pml ob1 -mca coll ^tuned ./mpitest
>>>
>>> It still hangs, but now with a different stack trace:
>>> (gdb) bt
>>> #0  0x00007f910d38ac3d in poll () from /lib64/libc.so.6
>>> #1  0x00007f910c815de6 in poll_dispatch (base=0x1a317b0,
>>> tv=0x7fff43ee3610) at poll.c:165
>>> #2  0x00007f910c80da90 in opal_libevent2022_event_base_loop
>>> (base=0x1a317b0, flags=2) at event.c:1630
>>> #3  0x00007f910c739144 in opal_progress () at runtime/opal_progress.c:171
>>> #4  0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40
>>> <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>)
>>>     at ../../../../opal/threads/condition.h:76
>>> #5  0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680)
>>> at ../../../../ompi/request/request.h:383
>>> #6  0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0,
>>> datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16,
>>> sendmode=MCA_PML_BASE_SEND_STANDARD,
>>>     comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259
>>> #7  0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear
>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at
>>> base/coll_base_barrier.c:368
>>> #8  0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280
>>> <ompi_mpi_comm_world>) at pbarrier.c:63
>>> #9  0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at
>>> mpitest.c:26
>>> (gdb)
>>>
>>> The mpitest.c program is as follows:
>>> #include <mpi.h>
>>> #include <stdio.h>
>>> #include <string.h>
>>>
>>> int main(int argc, char** argv)
>>> {
>>>     int world_size, world_rank, name_len;
>>>     char hostname[MPI_MAX_PROCESSOR_NAME], buf[8];
>>>
>>>     MPI_Init(&argc, &argv);
>>>     MPI_Comm_size(MPI_COMM_WORLD, &world_size);
>>>     MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
>>>     MPI_Get_processor_name(hostname, &name_len);
>>>     printf("Hello world from processor %s, rank %d out of %d
>>> processors\n", hostname, world_rank, world_size);
>>>     if (world_rank == 1)
>>>     {
>>>     MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>>>     printf("%s received %s\n", hostname, buf);
>>>     }
>>>     else
>>>     {
>>>     strcpy(buf, "haha!");
>>>     MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
>>>     printf("%s sent %s\n", hostname, buf);
>>>     }
>>>     MPI_Barrier(MPI_COMM_WORLD);
>>>     MPI_Finalize();
>>>     return 0;
>>> }
>>>
>>> The hostfile is as follows:
>>> 10.10.10.10 slots=1
>>> 10.10.10.11 slots=1
>>>
>>> The two nodes are connected by three physical and 3 logical networks:
>>> Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband
>>> Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and
>>> Infiniband)
>>>
>>> Please note again that this is a fresh, brand new clone.
>>>
>>> Is this a bug (perhaps a side effect of --disable-dlopen) or something I
>>> am doing wrong?
>>>
>>> Thanks
>>> Durga
>>>
>>> We learn from history that we never learn from history.
>>>
>>>
>>> _______________________________________________
>>> users mailing listus...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2016/04/28930.php
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/04/28932.php
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28942.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/04/28943.php
>>
>
>

Re: [OMPI users] Possible bug in MPI_Barrier() ?

Reply via email to