could you also test the nightly tarball of the v2.x and v1.10 branches ?

when a process sends a message to an other process for the first time,
it establish a TCP connection if not already present.
so i A sends to B first, then A connects to B.
when B wants to send to A, it uses the previously established connection.
/* fwiw, there is a race condition when A and B send to each other for the first time and at
the same time, and we do handle that in the tcp btl. */

a firewall on one node can explain why the application might success or hang depending
on the order of the nodes in the hostfile.
you might also try to invoke mpirun on both nodes and see if the behavior is consistent.

if i read the thread correctly, the runtime behavior on your system is not random but 100% reproductible,
am i right ?

you can also
mpirun --mca btl_tcp_base_verbose 100 ...
compress and post the logs, we might see something here

Cheers,

Gilles

On 4/19/2016 4:53 AM, dpchoudh . wrote:
Dear developers

Thank you all for jumping in to help. Here is what I have found so far:

1. Running Netpipe (NPmpi) between the two nodes (in either order) was successful, but following this test, my original code still hung. 2. Following Gilles's advice, I then added an MPI_Barrier() at the end of the code, just before MPI_Finalize(), and, to my surprise, the code ran to completion! 3. Then, I took out the barrier, leaving the code the way it was before, and it still ran to completion! 4. I tried several variations of call sequence, and all of them ran successfully.

I can't explain why the runtime behavior seems to depend on the phase of the moon, but, although I cannot prove it, I have a gut feeling there is a bug somewhere in the development branch. I never run into this issue when running the release branch. (I sometimes work as MPI application developer, when I use the release branch, and sometime as MPI developer, when I use the master branch).

Thank you all, again.

Durga

1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!

On Mon, Apr 18, 2016 at 8:04 AM, George Bosilca <bosi...@icl.utk.edu <mailto:bosi...@icl.utk.edu>> wrote:

    Durga,

    Can you run a simple netpipe over TCP using any of the two
    interfaces you mentioned?

    George

    On Apr 18, 2016 11:08 AM, "Gilles Gouaillardet"
    <gilles.gouaillar...@gmail.com
    <mailto:gilles.gouaillar...@gmail.com>> wrote:

        An other test is to swap the hostnames.
        If the single barrier test fails, this can hint to a firewall.

        Cheers,

        Gilles

        Gilles Gouaillardet <gil...@rist.or.jp
        <mailto:gil...@rist.or.jp>> wrote:
        sudo make uninstall
        will not remove modules that are no more built
        sudo rm -rf /usr/local/lib/openmpi
        is safe thought

        i confirm i did not see any issue on a system with two networks

        Cheers,

        Gilles

        On 4/18/2016 2:53 PM, dpchoudh . wrote:
        Hello Gilles

        I did a
        sudo make uninstall
        followed by a
        sudo make install
        on both nodes. But that did not make a difference. I will try
        your tarball build suggestion a bit later.

        What I find a bit strange is that only I seem to be getting
        into this issue. What could I be doing wrong? Or am I
        discovering an obscure bug?

        Thanks
        Durga

        1% of the executables have 99% of CPU privilege!
        Userspace code! Unite!! Occupy the kernel!!!

        On Mon, Apr 18, 2016 at 1:21 AM, Gilles Gouaillardet
        <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

            so you might want to
            rm -rf /usr/local/lib/openmpi
            and run
            make install
            again, just to make sure old stuff does not get in the way

            Cheers,

            Gilles


            On 4/18/2016 2:12 PM, dpchoudh . wrote:
            Hello Gilles

            Thank you very much for your feedback. You are right
            that my original stack trace was on code that was
            several weeks behind, but updating it just now did not
            seem to make a difference: I am copying the stack from
            the latest code below:

            On the master node:

            (gdb) bt
            #0  0x00007fc0524cbb7d in poll () from /lib64/libc.so.6
            #1  0x00007fc051e53116 in poll_dispatch (base=0x1aabbe0,
            tv=0x7fff29fcb240) at poll.c:165
            #2  0x00007fc051e4adb0 in
            opal_libevent2022_event_base_loop (base=0x1aabbe0,
            flags=2) at event.c:1630
            #3  0x00007fc051de9a00 in opal_progress () at
            runtime/opal_progress.c:171
            #4  0x00007fc04ce46b0b in opal_condition_wait
            (c=0x7fc052d3cde0 <ompi_request_cond>,
                m=0x7fc052d3cd60 <ompi_request_lock>) at
            ../../../../opal/threads/condition.h:76
            #5  0x00007fc04ce46cec in ompi_request_wait_completion
            (req=0x1b7b580)
                at ../../../../ompi/request/request.h:383
            #6  0x00007fc04ce48d4f in mca_pml_ob1_send
            (buf=0x7fff29fcb480, count=4,
                datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
            sendmode=MCA_PML_BASE_SEND_STANDARD,
                comm=0x601280 <ompi_mpi_comm_world>) at
            pml_ob1_isend.c:259
            #7  0x00007fc052a62d73 in PMPI_Send (buf=0x7fff29fcb480,
            count=4, type=0x601080 <ompi_mpi_char>, dest=1,
                tag=1, comm=0x601280 <ompi_mpi_comm_world>) at
            psend.c:78
            #8  0x0000000000400afa in main (argc=1,
            argv=0x7fff29fcb5e8) at mpitest.c:19
            (gdb)

            And on the non-master node

            (gdb) bt
            #0  0x00007fad2c32148d in nanosleep () from /lib64/libc.so.6
            #1  0x00007fad2c352014 in usleep () from /lib64/libc.so.6
            #2  0x00007fad296412de in OPAL_PMIX_PMIX120_PMIx_Fence
            (procs=0x0, nprocs=0, info=0x0, ninfo=0)
                at src/client/pmix_client_fence.c:100
            #3  0x00007fad2960e1a6 in pmix120_fence (procs=0x0,
            collect_data=0) at pmix120_client.c:258
            #4  0x00007fad2c89b2da in ompi_mpi_finalize () at
            runtime/ompi_mpi_finalize.c:242
            #5  0x00007fad2c8c5849 in PMPI_Finalize () at pfinalize.c:47
            #6  0x0000000000400958 in main (argc=1,
            argv=0x7fff163879c8) at mpitest.c:30
            (gdb)

            And my configuration was done as follows:

            $ ./configure --enable-debug --enable-debug-symbols

            I double checked to ensure that there is not an older
            installation of OpenMPI that is getting mixed up with
            the master branch.
            sudo yum list installed | grep -i mpi
            shows nothing on both nodes, and pmap -p <pid> shows
            that all the libraries are coming from /usr/local/lib,
            which seems to be correct. I am also quite sure about
            the firewall issue (that there is none). I will try out
            your suggestion on installing from a tarball and see how
            it goes.

            Thanks
            Durga

            1% of the executables have 99% of CPU privilege!
            Userspace code! Unite!! Occupy the kernel!!!

            On Mon, Apr 18, 2016 at 12:47 AM, Gilles Gouaillardet
            <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

                here is your stack trace

                #6 0x00007f72a0d09cd5 in mca_pml_ob1_send
                (buf=0x7fff81057db0, count=4,
                    datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
                sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280
                <ompi_mpi_comm_world>)

                at line 251


                that would be line 259 in current master, and this
                file was updated 21 days ago
                and that suggests your master is not quite up to date.

                even if the message is sent eagerly, the ob1 pml
                does use an internal request it will wait for.

                btw, did you configure with
                --enable-mpi-thread-multiple ?
                did you configure with
                --enable-mpirun-prefix-by-default ?
                did you configure with --disable-dlopen ?

                at first, i d recommend you download a tarball from
                https://www.open-mpi.org/nightly/master,
                configure && make && make install
                using a new install dir, and check if the issue is
                still here or not.

                there could be some side effects if some old modules
                were not removed and/or if you are
                not using the modules you expect.
                /* when it hangs, you can pmap <pid> and check the
                path of the openmpi libraries are the one you expect */

                what if you do not send/recv but invoke MPI_Barrier
                multiple times ?
                what if you send/recv a one byte message instead ?
                did you double check there is no firewall running on
                your nodes ?

                Cheers,

                Gilles






                On 4/18/2016 1:06 PM, dpchoudh . wrote:
                Thank you for your suggestion, Ralph. But it did
                not make any difference.

                Let me say that my code is about a week stale. I
                just did a git pull and am building it right now.
                The build takes quite a bit of time, so I avoid
                doing that unless there is a reason. But what I am
                trying out is the most basic functionality, so I'd
                think a week or so of lag would not make a difference.

                Does the stack trace suggest something to you? It
                seems that the send hangs; but a 4 byte send should
                be sent eagerly.

                Best regards
                'Durga

                1% of the executables have 99% of CPU privilege!
                Userspace code! Unite!! Occupy the kernel!!!

                On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain
                <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:

                    Try adding -mca oob_tcp_if_include eno1 to your
                    cmd line and see if that makes a difference

                    On Apr 17, 2016, at 8:43 PM, dpchoudh .
                    <dpcho...@gmail.com
                    <mailto:dpcho...@gmail.com>> wrote:

                    Hello Gilles and all

                    I am sorry to be bugging the developers, but
                    this issue seems to be nagging me, and I am
                    surprised it does not seem to affect anybody
                    else. But then again, I am using the master
                    branch, and most users are probably using a
                    released version.

                    This time I am using a totally different
                    cluster. This has NO verbs capable interface;
                    just 2 Ethernet (1 of which has no IP address
                    and hence is unusable) plus 1 proprietary
                    interface that currently supports only IP
                    traffic. The two IP interfaces (Ethernet and
                    proprietary) are on different IP subnets.

                    My test program is as follows:

                    #include <stdio.h>
                    #include <string.h>
                    #include "mpi.h"
                    int main(int argc, char *argv[])
                    {
                    char host[128];
                    int n;
                    MPI_Init(&argc, &argv);
                    MPI_Get_processor_name(host, &n);
                    printf("Hello from %s\n", host);
                    MPI_Comm_size(MPI_COMM_WORLD, &n);
                    printf("The world has %d nodes\n", n);
                    MPI_Comm_rank(MPI_COMM_WORLD, &n);
                    printf("My rank is %d\n",n);
                    //#if 0
                    if (n == 0)
                    {
                    strcpy(host, "ha!");
                    MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1,
                    1, MPI_COMM_WORLD);
                    printf("sent %s\n", host);
                    }
                    else
                    {
                    //int len = strlen(host) + 1;
                    bzero(host, 128);
                    MPI_Recv(host, 4, MPI_CHAR, 0, 1,
                    MPI_COMM_WORLD, MPI_STATUS_IGNORE);
                    printf("Received %s from rank 0\n", host);
                    }
                    //#endif
                    MPI_Finalize();
                    return 0;
                    }

                    This program, when run between two nodes,
                    hangs. The command was:
                    [durga@b-1 ~]$ mpirun -np 2 -hostfile
                    ~/hostfile -mca btl self,tcp -mca pml ob1 -mca
                    btl_tcp_if_include eno1 ./mpitest

                    And the hang is with the following output:
                    (eno1 is one of the gigEth interfaces, that
                    takes OOB traffic as well)

                    Hello from b-1
                    The world has 2 nodes
                    My rank is 0
                    Hello from b-2
                    The world has 2 nodes
                    My rank is 1

                    Note that if I uncomment the #if 0 - #endif
                    (i.e. comment out the MPI_Send()/MPI_Recv()
                    part, the program runs to completion. Also
                    note that the printfs following
                    MPI_Send()/MPI_Recv() do not show up on console.

                    Upon attaching gdb, the stack trace from the
                    master node is as follows:

                    Missing separate debuginfos, use:
                    debuginfo-install glibc-2.17-78.el7.x86_64
                    libpciaccess-0.13.4-2.el7.x86_64
                    (gdb) bt
                    #0 0x00007f72a533eb7d in poll () from
                    /lib64/libc.so.6
                    #1 0x00007f72a4cb7146 in poll_dispatch
                    (base=0xee33d0, tv=0x7fff81057b70)
                        at poll.c:165
                    #2 0x00007f72a4caede0 in
                    opal_libevent2022_event_base_loop (base=0xee33d0,
                        flags=2) at event.c:1630
                    #3 0x00007f72a4c4e692 in opal_progress () at
                    runtime/opal_progress.c:171
                    #4 0x00007f72a0d07ac1 in opal_condition_wait (
                    c=0x7f72a5bb1e00 <ompi_request_cond>,
                    m=0x7f72a5bb1d80 <ompi_request_lock>)
                        at ../../../../opal/threads/condition.h:76
                    #5 0x00007f72a0d07ca2 in
                    ompi_request_wait_completion (req=0x113eb80)
                        at ../../../../ompi/request/request.h:383
                    #6 0x00007f72a0d09cd5 in mca_pml_ob1_send
                    (buf=0x7fff81057db0, count=4,
                    datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1,
                    sendmode=MCA_PML_BASE_SEND_STANDARD,
                    comm=0x601280 <ompi_mpi_comm_world>)
                        at pml_ob1_isend.c:251
                    #7 0x00007f72a58d6be3 in PMPI_Send
                    (buf=0x7fff81057db0, count=4,
                    type=0x601080 <ompi_mpi_char>, dest=1, tag=1,
                    comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78
                    #8 0x0000000000400afa in main (argc=1,
                    argv=0x7fff81057f18) at mpitest.c:19
                    (gdb)

                    And the backtrace on the non-master node is:

                    (gdb) bt
                    #0 0x00007ff3b377e48d in nanosleep () from
                    /lib64/libc.so.6
                    #1 0x00007ff3b37af014 in usleep () from
                    /lib64/libc.so.6
                    #2 0x00007ff3b0c922de in
                    OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, nprocs=0,
                        info=0x0, ninfo=0) at
                    src/client/pmix_client_fence.c:100
                    #3 0x00007ff3b0c5f1a6 in pmix120_fence
                    (procs=0x0, collect_data=0)
                        at pmix120_client.c:258
                    #4 0x00007ff3b3cf8f4b in ompi_mpi_finalize ()
                        at runtime/ompi_mpi_finalize.c:242
                    #5 0x00007ff3b3d23295 in PMPI_Finalize () at
                    pfinalize.c:47
                    #6 0x0000000000400958 in main (argc=1,
                    argv=0x7fff785e8788) at mpitest.c:30
                    (gdb)

                    The hostfile is as follows:

                    [durga@b-1 ~]$ cat hostfile
                    10.4.70.10 slots=1
                    10.4.70.11 slots=1
                    #10.4.70.12 slots=1

                    And the ifconfig output from the master node
                    is as follows (the other node is similar; all
                    the IP interfaces are in their respective
                    subnets) :

                    [durga@b-1 ~]$ ifconfig
                    eno1:
                    flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu
                    1500
                            inet 10.4.70.10 netmask 255.255.255.0
                    broadcast 10.4.70.255
                            inet6 fe80::21e:c9ff:fefe:13df
                    prefixlen 64 scopeid 0x20<link>
                            ether 00:1e:c9:fe:13:df txqueuelen
                    1000 (Ethernet)
                            RX packets 48215 bytes 27842846 (26.5 MiB)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 52746 bytes 7817568 (7.4 MiB)
                            TX errors 0 dropped 0 overruns 0
                    carrier 0 collisions 0
                            device interrupt 16

                    eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
                            ether 00:1e:c9:fe:13:e0 txqueuelen
                    1000 (Ethernet)
                            RX packets 0 bytes 0 (0.0 B)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 0 bytes 0 (0.0 B)
                            TX errors 0 dropped 0 overruns 0
                    carrier 0 collisions 0
                            device interrupt 17

                    lf0:
                    flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu
                    2016
                            inet 192.168.1.2 netmask 255.255.255.0
                    broadcast 192.168.1.255
                            inet6 fe80::3002:ff:fe33:3333
                    prefixlen 64 scopeid 0x20<link>
                            ether 32:02:00:33:33:33 txqueuelen
                    1000 (Ethernet)
                            RX packets 10 bytes 512 (512.0 B)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 22 bytes 1536 (1.5 KiB)
                            TX errors 0 dropped 0 overruns 0
                    carrier 0 collisions 0

                    lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
                            inet 127.0.0.1 netmask 255.0.0.0
                            inet6 ::1  prefixlen 128  scopeid
                    0x10<host>
                            loop txqueuelen 0 (Local Loopback)
                            RX packets 26 bytes 1378 (1.3 KiB)
                            RX errors 0 dropped 0 overruns 0 frame 0
                            TX packets 26 bytes 1378 (1.3 KiB)
                            TX errors 0 dropped 0 overruns 0
                    carrier 0 collisions 0

                    Please help me with this. I am stuck with the
                    TCP transport, which is the most basic of all
                    transports.

                    Thanks in advance
                    Durga


                    1% of the executables have 99% of CPU privilege!
                    Userspace code! Unite!! Occupy the kernel!!!

                    On Tue, Apr 12, 2016 at 9:32 PM, Gilles
                    Gouaillardet <gil...@rist.or.jp
                    <mailto:gil...@rist.or.jp>> wrote:

                        This is quite unlikely, and fwiw, your
                        test program works for me.

                        i suggest you check your 3 TCP networks
                        are usable, for example

                        $ mpirun -np 2 -hostfile ~/hostfile -mca
                        btl self,tcp -mca pml ob1 --mca
                        btl_tcp_if_include xxx ./mpitest

                        in which xxx is a [list of] interface name :
                        eth0
                        eth1
                        ib0
                        eth0,eth1
                        eth0,ib0
                        ...
                        eth0,eth1,ib0

                        and see where problem start occuring.

                        btw, are your 3 interfaces in 3 different
                        subnet ? is routing required between two
                        interfaces of the same type ?

                        Cheers,

                        Gilles

                        On 4/13/2016 7:15 AM, dpchoudh . wrote:
                        Hi all

                        I have reported this issue before, but
                        then had brushed it off as something that
                        was caused by my modifications to the
                        source tree. It looks like that is not
                        the case.

                        Just now, I did the following:

                        1. Cloned a fresh copy from master.
                        2. Configured with the following flags,
                        built and installed it in my two-node
                        "cluster".
                        --enable-debug --enable-debug-symbols
                        --disable-dlopen
                        3. Compiled the following program,
                        mpitest.c with these flags: -g3 -Wall -Wextra
                        4. Ran it like this:
                        [durga@smallMPI ~]$ mpirun -np 2
                        -hostfile ~/hostfile -mca btl self,tcp
                        -mca pml ob1 ./mpitest

                        With this, the code hangs at
                        MPI_Barrier() on both nodes, after
                        generating the following output:

                        Hello world from processor smallMPI, rank
                        0 out of 2 processors
                        Hello world from processor bigMPI, rank 1
                        out of 2 processors
                        smallMPI sent haha!
                        bigMPI received haha!
                        <Hangs until killed by ^C>
                        Attaching to the hung process at one node
                        gives the following backtrace:

                        (gdb) bt
                        #0 0x00007f55b0f41c3d in poll () from
                        /lib64/libc.so.6
                        #1 0x00007f55b03ccde6 in poll_dispatch
                        (base=0x70e7b0, tv=0x7ffd1bb551c0) at
                        poll.c:165
                        #2 0x00007f55b03c4a90 in
                        opal_libevent2022_event_base_loop
                        (base=0x70e7b0, flags=2) at event.c:1630
                        #3 0x00007f55b02f0144 in opal_progress ()
                        at runtime/opal_progress.c:171
                        #4 0x00007f55b14b4d8b in
                        opal_condition_wait (c=0x7f55b19fec40
                        <ompi_request_cond>, m=0x7f55b19febc0
                        <ompi_request_lock>) at
                        ../opal/threads/condition.h:76
                        #5 0x00007f55b14b531b in
                        ompi_request_default_wait_all (count=2,
                        requests=0x7ffd1bb55370,
                        statuses=0x7ffd1bb55340) at
                        request/req_wait.c:287
                        #6 0x00007f55b157a225 in
                        ompi_coll_base_sendrecv_zero (dest=1,
                        stag=-16, source=1, rtag=-16,
                        comm=0x601280 <ompi_mpi_comm_world>)
                            at base/coll_base_barrier.c:63
                        #7 0x00007f55b157a92a in
                        ompi_coll_base_barrier_intra_two_procs
                        (comm=0x601280 <ompi_mpi_comm_world>,
                        module=0x7c2630) at
                        base/coll_base_barrier.c:308
                        #8 0x00007f55b15aafec in
                        ompi_coll_tuned_barrier_intra_dec_fixed
                        (comm=0x601280 <ompi_mpi_comm_world>,
                        module=0x7c2630) at
                        coll_tuned_decision_fixed.c:196
                        #9 0x00007f55b14d36fd in PMPI_Barrier
                        (comm=0x601280 <ompi_mpi_comm_world>) at
                        pbarrier.c:63
                        #10 0x0000000000400b0b in main (argc=1,
                        argv=0x7ffd1bb55658) at mpitest.c:26
                        (gdb)

                        Thinking that this might be a bug in
                        tuned collectives, since that is what the
                        stack shows, I ran the program like this
                        (basically adding the ^tuned part)

                        [durga@smallMPI ~]$ mpirun -np 2
                        -hostfile ~/hostfile -mca btl self,tcp
                        -mca pml ob1 -mca coll ^tuned ./mpitest

                        It still hangs, but now with a different
                        stack trace:
                        (gdb) bt
                        #0 0x00007f910d38ac3d in poll () from
                        /lib64/libc.so.6
                        #1 0x00007f910c815de6 in poll_dispatch
                        (base=0x1a317b0, tv=0x7fff43ee3610) at
                        poll.c:165
                        #2 0x00007f910c80da90 in
                        opal_libevent2022_event_base_loop
                        (base=0x1a317b0, flags=2) at event.c:1630
                        #3 0x00007f910c739144 in opal_progress ()
                        at runtime/opal_progress.c:171
                        #4 0x00007f910db130f7 in
                        opal_condition_wait (c=0x7f910de47c40
                        <ompi_request_cond>, m=0x7f910de47bc0
                        <ompi_request_lock>)
                            at
                        ../../../../opal/threads/condition.h:76
                        #5 0x00007f910db132d8 in
                        ompi_request_wait_completion
                        (req=0x1b07680) at
                        ../../../../ompi/request/request.h:383
                        #6 0x00007f910db1533b in mca_pml_ob1_send
                        (buf=0x0, count=0,
                        datatype=0x7f910de1e340 <ompi_mpi_byte>,
                        dst=1, tag=-16,
                        sendmode=MCA_PML_BASE_SEND_STANDARD,
                        comm=0x601280 <ompi_mpi_comm_world>) at
                        pml_ob1_isend.c:259
                        #7 0x00007f910d9c3b38 in
                        ompi_coll_base_barrier_intra_basic_linear
                        (comm=0x601280 <ompi_mpi_comm_world>,
                        module=0x1b092c0) at
                        base/coll_base_barrier.c:368
                        #8 0x00007f910d91c6fd in PMPI_Barrier
                        (comm=0x601280 <ompi_mpi_comm_world>) at
                        pbarrier.c:63
                        #9 0x0000000000400b0b in main (argc=1,
                        argv=0x7fff43ee3a58) at mpitest.c:26
                        (gdb)

                        The mpitest.c program is as follows:
                        #include <mpi.h>
                        #include <stdio.h>
                        #include <string.h>

                        int main(int argc, char** argv)
                        {
                            int world_size, world_rank, name_len;
                            char
                        hostname[MPI_MAX_PROCESSOR_NAME], buf[8];

                        MPI_Init(&argc, &argv);
                        MPI_Comm_size(MPI_COMM_WORLD, &world_size);
                        MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
                        MPI_Get_processor_name(hostname, &name_len);
                        printf("Hello world from processor %s,
                        rank %d out of %d processors\n",
                        hostname, world_rank, world_size);
                            if (world_rank == 1)
                            {
                        MPI_Recv(buf, 6, MPI_CHAR, 0, 99,
                        MPI_COMM_WORLD, MPI_STATUS_IGNORE);
                            printf("%s received %s\n", hostname,
                        buf);
                            }
                            else
                            {
                        strcpy(buf, "haha!");
                        MPI_Send(buf, 6, MPI_CHAR, 1, 99,
                        MPI_COMM_WORLD);
                            printf("%s sent %s\n", hostname, buf);
                            }
                        MPI_Barrier(MPI_COMM_WORLD);
                        MPI_Finalize();
                            return 0;
                        }

                        The hostfile is as follows:
                        10.10.10.10 slots=1
                        10.10.10.11 slots=1

                        The two nodes are connected by three
                        physical and 3 logical networks:
                        Physical: Gigabit Ethernet, 10G iWARP,
                        20G Infiniband
                        Logical: IP (all 3), PSM (Qlogic
                        Infiniband), Verbs (iWARP and Infiniband)

                        Please note again that this is a fresh,
                        brand new clone.

                        Is this a bug (perhaps a side effect of
                        --disable-dlopen) or something I am doing
                        wrong?

                        Thanks
                        Durga

                        We learn from history that we never learn
                        from history.


                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org
                        <mailto:us...@open-mpi.org>
                        
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
                        Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/04/28930.php


                        _______________________________________________
                        users mailing list
                        us...@open-mpi.org <mailto:us...@open-mpi.org>
                        Subscription:
                        http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:

        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2016/04/28951.php
        ...


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2016/04/28954.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28957.php

Reply via email to