I am having a problem on my linux desktop where mpi_init hangs for approximately 64 seconds if I have my vpn client connected but runs immediately if I disconnect the vpn. I've picked through the FAQ and Google but have failed to come up with a solution.
Some potentially relevant information: I am using Open MPI 1.4.3 under ubuntu 12.04.1 and Cisco AnyConnect VPN Client. (I have also downloaded openmpi 1.6.4 and built it from source but believe it behaves the same way.) Some potentially irrelevant information: I believe SSH tunneling is disabled by the vpn. While the vpn is connected, ifconfig shows an extra interface (cscotun0 with inet addr:10.248.17.27 that shows up in the contact.txt file: wt217:~/wrk/mpi> cat /tmp/openmpi-sessions-dab143@wt217_0/29142/contact.txt 1909850112.0;tcp://192.168.1.3:48237;tcp://10.248.17.27:48237 22001 The code is simply #include <stdio.h> #include <mpi.h> int main(int argc, char** argv) { MPI_Init(&argc, &argv); MPI_Finalize(); return 0; } I compile it using "mpicc -g mpi_hello.c -o mpi_hello" and execute it using "mpirun -d -v ./mpi_hello". (The problem occurs whether or not I asked for more than one processor.) With verbosity on, I get the following output: wt217:~/wrk/mpi> mpirun -d -v ./mpi_hello [wt217:22015] procdir: /tmp/openmpi-sessions-dab143@wt217_0/29144/0/0 [wt217:22015] jobdir: /tmp/openmpi-sessions-dab143@wt217_0/29144/0 [wt217:22015] top: openmpi-sessions-dab143@wt217_0 [wt217:22015] tmp: /tmp [wt217:22015] [[29144,0],0] node[0].name wt217 daemon 0 arch ffc91200 [wt217:22015] Info: Setting up debugger process table for applications MPIR_being_debugged = 0 MPIR_debug_state = 1 MPIR_partial_attach_ok = 1 MPIR_i_am_starter = 0 MPIR_proctable_size = 1 MPIR_proctable: (i, host, exe, pid) = (0, wt217, /home/dab143/wrk/mpi/./mpi_hello, 22016) [wt217:22016] procdir: /tmp/openmpi-sessions-dab143@wt217_0/29144/1/0 [wt217:22016] jobdir: /tmp/openmpi-sessions-dab143@wt217_0/29144/1 [wt217:22016] top: openmpi-sessions-dab143@wt217_0 [wt217:22016] tmp: /tmp <hangs for approximately 64 seconds> [wt217:22016] [[29144,1],0] node[0].name wt217 daemon 0 arch ffc91200 [wt217:22016] sess_dir_finalize: proc session dir not empty - leaving [wt217:22015] sess_dir_finalize: proc session dir not empty - leaving [wt217:22015] sess_dir_finalize: job session dir not empty - leaving [wt217:22015] sess_dir_finalize: proc session dir not empty - leaving orterun: exiting with status 0 The code hangs for approximately 64 seconds after the line that reads "tmp: /tmp". If I attach gdb to the process during this time, the stack trace (attached) shows that the pause is in __GI___poll in /sysdeps/unix/sysv/linux/poll.c:83. If I add "-mca oob_tcp_if_exclude cscotun0", then the corresponding address for that vpn interface no longer shows up in contact.txt, but the problem remains. I also add "-mca btl ^cscotun0 -mca btl_tcp_if_exclude cscotun0" with no effect. Any idea what is hanging this up or how I can get more information as to what is going on during the pause? I assume connecting the vpn has caused mpi_init to look for something that isn't available and that eventually times out, but I don't know what. Output from ompi_info and the gdb stack trace is attached. Thanks, David
stack.txt.bz2
Description: Binary data
ompi_info.txt.bz2
Description: Binary data