Durga, Can you run a simple netpipe over TCP using any of the two interfaces you mentioned?
George On Apr 18, 2016 11:08 AM, "Gilles Gouaillardet" < gilles.gouaillar...@gmail.com> wrote: > An other test is to swap the hostnames. > If the single barrier test fails, this can hint to a firewall. > > Cheers, > > Gilles > > Gilles Gouaillardet <gil...@rist.or.jp> wrote: > sudo make uninstall > will not remove modules that are no more built > sudo rm -rf /usr/local/lib/openmpi > is safe thought > > i confirm i did not see any issue on a system with two networks > > Cheers, > > Gilles > > On 4/18/2016 2:53 PM, dpchoudh . wrote: > > Hello Gilles > > I did a > sudo make uninstall > followed by a > sudo make install > on both nodes. But that did not make a difference. I will try your tarball > build suggestion a bit later. > > What I find a bit strange is that only I seem to be getting into this > issue. What could I be doing wrong? Or am I discovering an obscure bug? > > Thanks > Durga > > 1% of the executables have 99% of CPU privilege! > Userspace code! Unite!! Occupy the kernel!!! > > On Mon, Apr 18, 2016 at 1:21 AM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> so you might want to >> rm -rf /usr/local/lib/openmpi >> and run >> make install >> again, just to make sure old stuff does not get in the way >> >> Cheers, >> >> Gilles >> >> >> On 4/18/2016 2:12 PM, dpchoudh . wrote: >> >> Hello Gilles >> >> Thank you very much for your feedback. You are right that my original >> stack trace was on code that was several weeks behind, but updating it just >> now did not seem to make a difference: I am copying the stack from the >> latest code below: >> >> On the master node: >> >> (gdb) bt >> #0 0x00007fc0524cbb7d in poll () from /lib64/libc.so.6 >> #1 0x00007fc051e53116 in poll_dispatch (base=0x1aabbe0, >> tv=0x7fff29fcb240) at poll.c:165 >> #2 0x00007fc051e4adb0 in opal_libevent2022_event_base_loop >> (base=0x1aabbe0, flags=2) at event.c:1630 >> #3 0x00007fc051de9a00 in opal_progress () at runtime/opal_progress.c:171 >> #4 0x00007fc04ce46b0b in opal_condition_wait (c=0x7fc052d3cde0 >> <ompi_request_cond>, >> m=0x7fc052d3cd60 <ompi_request_lock>) at >> ../../../../opal/threads/condition.h:76 >> #5 0x00007fc04ce46cec in ompi_request_wait_completion (req=0x1b7b580) >> at ../../../../ompi/request/request.h:383 >> #6 0x00007fc04ce48d4f in mca_pml_ob1_send (buf=0x7fff29fcb480, count=4, >> datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, >> sendmode=MCA_PML_BASE_SEND_STANDARD, >> comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259 >> #7 0x00007fc052a62d73 in PMPI_Send (buf=0x7fff29fcb480, count=4, >> type=0x601080 <ompi_mpi_char>, dest=1, >> tag=1, comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78 >> #8 0x0000000000400afa in main (argc=1, argv=0x7fff29fcb5e8) at >> mpitest.c:19 >> (gdb) >> >> And on the non-master node >> >> (gdb) bt >> #0 0x00007fad2c32148d in nanosleep () from /lib64/libc.so.6 >> #1 0x00007fad2c352014 in usleep () from /lib64/libc.so.6 >> #2 0x00007fad296412de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, >> nprocs=0, info=0x0, ninfo=0) >> at src/client/pmix_client_fence.c:100 >> #3 0x00007fad2960e1a6 in pmix120_fence (procs=0x0, collect_data=0) at >> pmix120_client.c:258 >> #4 0x00007fad2c89b2da in ompi_mpi_finalize () at >> runtime/ompi_mpi_finalize.c:242 >> #5 0x00007fad2c8c5849 in PMPI_Finalize () at pfinalize.c:47 >> #6 0x0000000000400958 in main (argc=1, argv=0x7fff163879c8) at >> mpitest.c:30 >> (gdb) >> >> And my configuration was done as follows: >> >> $ ./configure --enable-debug --enable-debug-symbols >> >> I double checked to ensure that there is not an older installation of >> OpenMPI that is getting mixed up with the master branch. >> sudo yum list installed | grep -i mpi >> shows nothing on both nodes, and pmap -p <pid> shows that all the >> libraries are coming from /usr/local/lib, which seems to be correct. I >> am also quite sure about the firewall issue (that there is none). I will >> try out your suggestion on installing from a tarball and see how it goes. >> >> Thanks >> Durga >> >> 1% of the executables have 99% of CPU privilege! >> Userspace code! Unite!! Occupy the kernel!!! >> >> On Mon, Apr 18, 2016 at 12:47 AM, Gilles Gouaillardet <gil...@rist.or.jp> >> wrote: >> >>> here is your stack trace >>> >>> #6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4, >>> datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, >>> sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280 >>> <ompi_mpi_comm_world>) >>> >>> at line 251 >>> >>> >>> that would be line 259 in current master, and this file was updated 21 >>> days ago >>> and that suggests your master is not quite up to date. >>> >>> even if the message is sent eagerly, the ob1 pml does use an internal >>> request it will wait for. >>> >>> btw, did you configure with --enable-mpi-thread-multiple ? >>> did you configure with --enable-mpirun-prefix-by-default ? >>> did you configure with --disable-dlopen ? >>> >>> at first, i d recommend you download a tarball from >>> https://www.open-mpi.org/nightly/master, >>> configure && make && make install >>> using a new install dir, and check if the issue is still here or not. >>> >>> there could be some side effects if some old modules were not removed >>> and/or if you are >>> not using the modules you expect. >>> /* when it hangs, you can pmap <pid> and check the path of the openmpi >>> libraries are the one you expect */ >>> >>> what if you do not send/recv but invoke MPI_Barrier multiple times ? >>> what if you send/recv a one byte message instead ? >>> did you double check there is no firewall running on your nodes ? >>> >>> Cheers, >>> >>> Gilles >>> >>> >>> >>> >>> >>> >>> On 4/18/2016 1:06 PM, dpchoudh . wrote: >>> >>> Thank you for your suggestion, Ralph. But it did not make any difference. >>> >>> Let me say that my code is about a week stale. I just did a git pull and >>> am building it right now. The build takes quite a bit of time, so I avoid >>> doing that unless there is a reason. But what I am trying out is the most >>> basic functionality, so I'd think a week or so of lag would not make a >>> difference. >>> >>> Does the stack trace suggest something to you? It seems that the send >>> hangs; but a 4 byte send should be sent eagerly. >>> >>> Best regards >>> 'Durga >>> >>> 1% of the executables have 99% of CPU privilege! >>> Userspace code! Unite!! Occupy the kernel!!! >>> >>> On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain <r...@open-mpi.org> >>> wrote: >>> >>>> Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if >>>> that makes a difference >>>> >>>> On Apr 17, 2016, at 8:43 PM, dpchoudh . <dpcho...@gmail.com> wrote: >>>> >>>> Hello Gilles and all >>>> >>>> I am sorry to be bugging the developers, but this issue seems to be >>>> nagging me, and I am surprised it does not seem to affect anybody else. But >>>> then again, I am using the master branch, and most users are probably using >>>> a released version. >>>> >>>> This time I am using a totally different cluster. This has NO verbs >>>> capable interface; just 2 Ethernet (1 of which has no IP address and hence >>>> is unusable) plus 1 proprietary interface that currently supports only IP >>>> traffic. The two IP interfaces (Ethernet and proprietary) are on different >>>> IP subnets. >>>> >>>> My test program is as follows: >>>> >>>> #include <stdio.h> >>>> #include <string.h> >>>> #include "mpi.h" >>>> int main(int argc, char *argv[]) >>>> { >>>> char host[128]; >>>> int n; >>>> MPI_Init(&argc, &argv); >>>> MPI_Get_processor_name(host, &n); >>>> printf("Hello from %s\n", host); >>>> MPI_Comm_size(MPI_COMM_WORLD, &n); >>>> printf("The world has %d nodes\n", n); >>>> MPI_Comm_rank(MPI_COMM_WORLD, &n); >>>> printf("My rank is %d\n",n); >>>> //#if 0 >>>> if (n == 0) >>>> { >>>> strcpy(host, "ha!"); >>>> MPI_Send(host, strlen(host) + 1, MPI_CHAR, 1, 1, MPI_COMM_WORLD); >>>> printf("sent %s\n", host); >>>> } >>>> else >>>> { >>>> //int len = strlen(host) + 1; >>>> bzero(host, 128); >>>> MPI_Recv(host, 4, MPI_CHAR, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE); >>>> printf("Received %s from rank 0\n", host); >>>> } >>>> //#endif >>>> MPI_Finalize(); >>>> return 0; >>>> } >>>> >>>> This program, when run between two nodes, hangs. The command was: >>>> [durga@b-1 ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp >>>> -mca pml ob1 -mca btl_tcp_if_include eno1 ./mpitest >>>> >>>> And the hang is with the following output: (eno1 is one of the gigEth >>>> interfaces, that takes OOB traffic as well) >>>> >>>> Hello from b-1 >>>> The world has 2 nodes >>>> My rank is 0 >>>> Hello from b-2 >>>> The world has 2 nodes >>>> My rank is 1 >>>> >>>> Note that if I uncomment the #if 0 - #endif (i.e. comment out the >>>> MPI_Send()/MPI_Recv() part, the program runs to completion. Also note that >>>> the printfs following MPI_Send()/MPI_Recv() do not show up on console. >>>> >>>> Upon attaching gdb, the stack trace from the master node is as follows: >>>> >>>> Missing separate debuginfos, use: debuginfo-install >>>> glibc-2.17-78.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 >>>> (gdb) bt >>>> #0 0x00007f72a533eb7d in poll () from /lib64/libc.so.6 >>>> #1 0x00007f72a4cb7146 in poll_dispatch (base=0xee33d0, >>>> tv=0x7fff81057b70) >>>> at poll.c:165 >>>> #2 0x00007f72a4caede0 in opal_libevent2022_event_base_loop >>>> (base=0xee33d0, >>>> flags=2) at event.c:1630 >>>> #3 0x00007f72a4c4e692 in opal_progress () at >>>> runtime/opal_progress.c:171 >>>> #4 0x00007f72a0d07ac1 in opal_condition_wait ( >>>> c=0x7f72a5bb1e00 <ompi_request_cond>, m=0x7f72a5bb1d80 >>>> <ompi_request_lock>) >>>> at ../../../../opal/threads/condition.h:76 >>>> #5 0x00007f72a0d07ca2 in ompi_request_wait_completion (req=0x113eb80) >>>> at ../../../../ompi/request/request.h:383 >>>> #6 0x00007f72a0d09cd5 in mca_pml_ob1_send (buf=0x7fff81057db0, count=4, >>>> datatype=0x601080 <ompi_mpi_char>, dst=1, tag=1, >>>> sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0x601280 >>>> <ompi_mpi_comm_world>) >>>> at pml_ob1_isend.c:251 >>>> #7 0x00007f72a58d6be3 in PMPI_Send (buf=0x7fff81057db0, count=4, >>>> type=0x601080 <ompi_mpi_char>, dest=1, tag=1, >>>> comm=0x601280 <ompi_mpi_comm_world>) at psend.c:78 >>>> #8 0x0000000000400afa in main (argc=1, argv=0x7fff81057f18) at >>>> mpitest.c:19 >>>> (gdb) >>>> >>>> And the backtrace on the non-master node is: >>>> >>>> (gdb) bt >>>> #0 0x00007ff3b377e48d in nanosleep () from /lib64/libc.so.6 >>>> #1 0x00007ff3b37af014 in usleep () from /lib64/libc.so.6 >>>> #2 0x00007ff3b0c922de in OPAL_PMIX_PMIX120_PMIx_Fence (procs=0x0, >>>> nprocs=0, >>>> info=0x0, ninfo=0) at src/client/pmix_client_fence.c:100 >>>> #3 0x00007ff3b0c5f1a6 in pmix120_fence (procs=0x0, collect_data=0) >>>> at pmix120_client.c:258 >>>> #4 0x00007ff3b3cf8f4b in ompi_mpi_finalize () >>>> at runtime/ompi_mpi_finalize.c:242 >>>> #5 0x00007ff3b3d23295 in PMPI_Finalize () at pfinalize.c:47 >>>> #6 0x0000000000400958 in main (argc=1, argv=0x7fff785e8788) at >>>> mpitest.c:30 >>>> (gdb) >>>> >>>> The hostfile is as follows: >>>> >>>> [durga@b-1 ~]$ cat hostfile >>>> 10.4.70.10 slots=1 >>>> 10.4.70.11 slots=1 >>>> #10.4.70.12 slots=1 >>>> >>>> And the ifconfig output from the master node is as follows (the other >>>> node is similar; all the IP interfaces are in their respective subnets) : >>>> >>>> [durga@b-1 ~]$ ifconfig >>>> eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 >>>> inet 10.4.70.10 netmask 255.255.255.0 broadcast 10.4.70.255 >>>> inet6 fe80::21e:c9ff:fefe:13df prefixlen 64 scopeid 0x20<link> >>>> ether 00:1e:c9:fe:13:df txqueuelen 1000 (Ethernet) >>>> RX packets 48215 bytes 27842846 (26.5 MiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 52746 bytes 7817568 (7.4 MiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> device interrupt 16 >>>> >>>> eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 >>>> ether 00:1e:c9:fe:13:e0 txqueuelen 1000 (Ethernet) >>>> RX packets 0 bytes 0 (0.0 B) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 0 bytes 0 (0.0 B) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> device interrupt 17 >>>> >>>> lf0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2016 >>>> inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 >>>> inet6 fe80::3002:ff:fe33:3333 prefixlen 64 scopeid 0x20<link> >>>> ether 32:02:00:33:33:33 txqueuelen 1000 (Ethernet) >>>> RX packets 10 bytes 512 (512.0 B) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 22 bytes 1536 (1.5 KiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 >>>> inet 127.0.0.1 netmask 255.0.0.0 >>>> inet6 ::1 prefixlen 128 scopeid 0x10<host> >>>> loop txqueuelen 0 (Local Loopback) >>>> RX packets 26 bytes 1378 (1.3 KiB) >>>> RX errors 0 dropped 0 overruns 0 frame 0 >>>> TX packets 26 bytes 1378 (1.3 KiB) >>>> TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 >>>> >>>> Please help me with this. I am stuck with the TCP transport, which is >>>> the most basic of all transports. >>>> >>>> Thanks in advance >>>> Durga >>>> >>>> >>>> 1% of the executables have 99% of CPU privilege! >>>> Userspace code! Unite!! Occupy the kernel!!! >>>> >>>> On Tue, Apr 12, 2016 at 9:32 PM, Gilles Gouaillardet <gil...@rist.or.jp >>>> > wrote: >>>> >>>>> This is quite unlikely, and fwiw, your test program works for me. >>>>> >>>>> i suggest you check your 3 TCP networks are usable, for example >>>>> >>>>> $ mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp -mca pml ob1 >>>>> --mca btl_tcp_if_include xxx ./mpitest >>>>> >>>>> in which xxx is a [list of] interface name : >>>>> eth0 >>>>> eth1 >>>>> ib0 >>>>> eth0,eth1 >>>>> eth0,ib0 >>>>> ... >>>>> eth0,eth1,ib0 >>>>> >>>>> and see where problem start occuring. >>>>> >>>>> btw, are your 3 interfaces in 3 different subnet ? is routing required >>>>> between two interfaces of the same type ? >>>>> >>>>> Cheers, >>>>> >>>>> Gilles >>>>> >>>>> On 4/13/2016 7:15 AM, dpchoudh . wrote: >>>>> >>>>> Hi all >>>>> >>>>> I have reported this issue before, but then had brushed it off as >>>>> something that was caused by my modifications to the source tree. It looks >>>>> like that is not the case. >>>>> >>>>> Just now, I did the following: >>>>> >>>>> 1. Cloned a fresh copy from master. >>>>> 2. Configured with the following flags, built and installed it in my >>>>> two-node "cluster". >>>>> --enable-debug --enable-debug-symbols --disable-dlopen >>>>> 3. Compiled the following program, mpitest.c with these flags: -g3 >>>>> -Wall -Wextra >>>>> 4. Ran it like this: >>>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl >>>>> self,tcp -mca pml ob1 ./mpitest >>>>> >>>>> With this, the code hangs at MPI_Barrier() on both nodes, after >>>>> generating the following output: >>>>> >>>>> Hello world from processor smallMPI, rank 0 out of 2 processors >>>>> Hello world from processor bigMPI, rank 1 out of 2 processors >>>>> smallMPI sent haha! >>>>> bigMPI received haha! >>>>> <Hangs until killed by ^C> >>>>> Attaching to the hung process at one node gives the following >>>>> backtrace: >>>>> >>>>> (gdb) bt >>>>> #0 0x00007f55b0f41c3d in poll () from /lib64/libc.so.6 >>>>> #1 0x00007f55b03ccde6 in poll_dispatch (base=0x70e7b0, >>>>> tv=0x7ffd1bb551c0) at poll.c:165 >>>>> #2 0x00007f55b03c4a90 in opal_libevent2022_event_base_loop >>>>> (base=0x70e7b0, flags=2) at event.c:1630 >>>>> #3 0x00007f55b02f0144 in opal_progress () at >>>>> runtime/opal_progress.c:171 >>>>> #4 0x00007f55b14b4d8b in opal_condition_wait (c=0x7f55b19fec40 >>>>> <ompi_request_cond>, m=0x7f55b19febc0 <ompi_request_lock>) at >>>>> ../opal/threads/condition.h:76 >>>>> #5 0x00007f55b14b531b in ompi_request_default_wait_all (count=2, >>>>> requests=0x7ffd1bb55370, statuses=0x7ffd1bb55340) at >>>>> request/req_wait.c:287 >>>>> #6 0x00007f55b157a225 in ompi_coll_base_sendrecv_zero (dest=1, >>>>> stag=-16, source=1, rtag=-16, comm=0x601280 <ompi_mpi_comm_world>) >>>>> at base/coll_base_barrier.c:63 >>>>> #7 0x00007f55b157a92a in ompi_coll_base_barrier_intra_two_procs >>>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >>>>> base/coll_base_barrier.c:308 >>>>> #8 0x00007f55b15aafec in ompi_coll_tuned_barrier_intra_dec_fixed >>>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x7c2630) at >>>>> coll_tuned_decision_fixed.c:196 >>>>> #9 0x00007f55b14d36fd in PMPI_Barrier (comm=0x601280 >>>>> <ompi_mpi_comm_world>) at pbarrier.c:63 >>>>> #10 0x0000000000400b0b in main (argc=1, argv=0x7ffd1bb55658) at >>>>> mpitest.c:26 >>>>> (gdb) >>>>> >>>>> Thinking that this might be a bug in tuned collectives, since that is >>>>> what the stack shows, I ran the program like this (basically adding the >>>>> ^tuned part) >>>>> >>>>> [durga@smallMPI ~]$ mpirun -np 2 -hostfile ~/hostfile -mca btl >>>>> self,tcp -mca pml ob1 -mca coll ^tuned ./mpitest >>>>> >>>>> It still hangs, but now with a different stack trace: >>>>> (gdb) bt >>>>> #0 0x00007f910d38ac3d in poll () from /lib64/libc.so.6 >>>>> #1 0x00007f910c815de6 in poll_dispatch (base=0x1a317b0, >>>>> tv=0x7fff43ee3610) at poll.c:165 >>>>> #2 0x00007f910c80da90 in opal_libevent2022_event_base_loop >>>>> (base=0x1a317b0, flags=2) at event.c:1630 >>>>> #3 0x00007f910c739144 in opal_progress () at >>>>> runtime/opal_progress.c:171 >>>>> #4 0x00007f910db130f7 in opal_condition_wait (c=0x7f910de47c40 >>>>> <ompi_request_cond>, m=0x7f910de47bc0 <ompi_request_lock>) >>>>> at ../../../../opal/threads/condition.h:76 >>>>> #5 0x00007f910db132d8 in ompi_request_wait_completion (req=0x1b07680) >>>>> at ../../../../ompi/request/request.h:383 >>>>> #6 0x00007f910db1533b in mca_pml_ob1_send (buf=0x0, count=0, >>>>> datatype=0x7f910de1e340 <ompi_mpi_byte>, dst=1, tag=-16, >>>>> sendmode=MCA_PML_BASE_SEND_STANDARD, >>>>> comm=0x601280 <ompi_mpi_comm_world>) at pml_ob1_isend.c:259 >>>>> #7 0x00007f910d9c3b38 in ompi_coll_base_barrier_intra_basic_linear >>>>> (comm=0x601280 <ompi_mpi_comm_world>, module=0x1b092c0) at >>>>> base/coll_base_barrier.c:368 >>>>> #8 0x00007f910d91c6fd in PMPI_Barrier (comm=0x601280 >>>>> <ompi_mpi_comm_world>) at pbarrier.c:63 >>>>> #9 0x0000000000400b0b in main (argc=1, argv=0x7fff43ee3a58) at >>>>> mpitest.c:26 >>>>> (gdb) >>>>> >>>>> The mpitest.c program is as follows: >>>>> #include <mpi.h> >>>>> #include <stdio.h> >>>>> #include <string.h> >>>>> >>>>> int main(int argc, char** argv) >>>>> { >>>>> int world_size, world_rank, name_len; >>>>> char hostname[MPI_MAX_PROCESSOR_NAME], buf[8]; >>>>> >>>>> MPI_Init(&argc, &argv); >>>>> MPI_Comm_size(MPI_COMM_WORLD, &world_size); >>>>> MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); >>>>> MPI_Get_processor_name(hostname, &name_len); >>>>> printf("Hello world from processor %s, rank %d out of %d >>>>> processors\n", hostname, world_rank, world_size); >>>>> if (world_rank == 1) >>>>> { >>>>> MPI_Recv(buf, 6, MPI_CHAR, 0, 99, MPI_COMM_WORLD, >>>>> MPI_STATUS_IGNORE); >>>>> printf("%s received %s\n", hostname, buf); >>>>> } >>>>> else >>>>> { >>>>> strcpy(buf, "haha!"); >>>>> MPI_Send(buf, 6, MPI_CHAR, 1, 99, MPI_COMM_WORLD); >>>>> printf("%s sent %s\n", hostname, buf); >>>>> } >>>>> MPI_Barrier(MPI_COMM_WORLD); >>>>> MPI_Finalize(); >>>>> return 0; >>>>> } >>>>> >>>>> The hostfile is as follows: >>>>> 10.10.10.10 slots=1 >>>>> 10.10.10.11 slots=1 >>>>> >>>>> The two nodes are connected by three physical and 3 logical networks: >>>>> Physical: Gigabit Ethernet, 10G iWARP, 20G Infiniband >>>>> Logical: IP (all 3), PSM (Qlogic Infiniband), Verbs (iWARP and >>>>> Infiniband) >>>>> >>>>> Please note again that this is a fresh, brand new clone. >>>>> >>>>> Is this a bug (perhaps a side effect of --disable-dlopen) or something >>>>> I am doing wrong? >>>>> >>>>> Thanks >>>>> Durga >>>>> >>>>> We learn from history that we never learn from history. >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing listus...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/04/28930.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/users/2016/04/28932.php> >>>> >>>> > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/28951.php > ...