Hi Ralph,
I saw the pull request and did a test with v2.0.1rc1, but the problem persists. Any ideas? Thanks, Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of r...@open-mpi.org <r...@open-mpi.org> Sent: Wednesday, August 24, 2016 1:27:28 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Bingo - found it, fix submitted and hope to get it into 2.0.1 Thanks for the assist! Ralph On Aug 24, 2016, at 12:15 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: I configured v2.0.1rc1 with --enable-debug and ran the test with --mca iof_base_verbose 100. I also added -display-devel-map in case it provides some useful information. Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the same node. $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> debug_info.txt The debug_info.txt is attached. Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Wednesday, August 24, 2016 12:14:26 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m also using bash, but on CentOS-7, so I suspect the OS is the difference. Can you configure OMPI with --enable-debug, and then run the test again with --mca iof_base_verbose 100? It will hopefully tell us something about why the IO subsystem is stuck. On Aug 24, 2016, at 8:46 AM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: Hi Ralph, For our tests, rank 0 is always on the same node with mpirun. I just tested mpirun with -nolocal and it still hangs. Information on shell and OS $ echo $0 -bash $ lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: Scientific Description: Scientific Linux release 6.8 (Carbon) Release: 6.8 Codename: Carbon $ uname -a Linux login.crane.hcc.unl.edu<http://login.crane.hcc.unl.edu/> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 x86_64 GNU/Linux Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Tuesday, August 23, 2016 8:14:48 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on my cluster. I’ll give it a try. Jingchao: is rank 0 on the node with mpirun, or on a remote node? On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet <gil...@rist.or.jp<mailto:gil...@rist.or.jp>> wrote: Ralph, did you run task 0 and mpirun on different nodes ? i observed some random hangs, though i cannot blame openmpi 100% yet Cheers, Gilles On 8/24/2016 9:41 AM, r...@open-mpi.org<mailto:r...@open-mpi.org> wrote: Very strange. I cannot reproduce it as I’m able to run any number of nodes and procs, pushing over 100Mbytes thru without any problem. Which leads me to suspect that the issue here is with the tty interface. Can you tell me what shell and OS you are running? On Aug 23, 2016, at 3:25 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores each node, I got the following $ mpirun ./a.out < test.in Rank 2 has cleared MPI_Init Rank 4 has cleared MPI_Init Rank 7 has cleared MPI_Init Rank 8 has cleared MPI_Init Rank 0 has cleared MPI_Init Rank 5 has cleared MPI_Init Rank 6 has cleared MPI_Init Rank 9 has cleared MPI_Init Rank 1 has cleared MPI_Init Rank 16 has cleared MPI_Init Rank 19 has cleared MPI_Init Rank 10 has cleared MPI_Init Rank 11 has cleared MPI_Init Rank 12 has cleared MPI_Init Rank 13 has cleared MPI_Init Rank 14 has cleared MPI_Init Rank 15 has cleared MPI_Init Rank 17 has cleared MPI_Init Rank 18 has cleared MPI_Init Rank 3 has cleared MPI_Init then it just hanged. --Jingchao Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Tuesday, August 23, 2016 4:03:07 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 The IO forwarding messages all flow over the Ethernet, so the type of fabric is irrelevant. The number of procs involved would definitely have an impact, but that might not be due to the IO forwarding subsystem. We know we have flow control issues with collectives like Bcast that don’t have built-in synchronization points. How many reads were you able to do before it hung? I was running it on my little test setup (2 nodes, using only a few procs), but I’ll try scaling up and see what happens. I’ll also try introducing some forced “syncs” on the Bcast and see if that solves the issue. Ralph On Aug 23, 2016, at 2:30 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: Hi Ralph, I tested v2.0.1rc1 with your code but has the same issue. I also installed v2.0.1rc1 on a different cluster which has Mellanox QDR Infiniband and get the same result. For the tests you have done, how many cores and nodes did you use? I can trigger the problem by using multiple nodes and each node with more than 10 cores. Thank you for looking into this. Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Monday, August 22, 2016 10:23:42 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 FWIW: I just tested forwarding up to 100MBytes via stdin using the simple test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest upgrading when the official release comes out, or going ahead and at least testing 2.0.1rc1 on your machine. Or you can test this program with some input file and let me know if it works for you. Ralph #include <stdlib.h> #include <stdio.h> #include <string.h> #include <stdbool.h> #include <unistd.h> #include <mpi.h> #define ORTE_IOF_BASE_MSG_MAX 2048 int main(int argc, char *argv[]) { int i, rank, size, next, prev, tag = 201; int pos, msgsize, nbytes; bool done; char *msg; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank); next = (rank + 1) % size; prev = (rank + size - 1) % size; msg = malloc(ORTE_IOF_BASE_MSG_MAX); pos = 0; nbytes = 0; if (0 == rank) { while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) { fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos); if (msgsize > 0) { MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD); } ++pos; nbytes += msgsize; } fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, pos); memset(msg, 0, ORTE_IOF_BASE_MSG_MAX); MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD); MPI_Barrier(MPI_COMM_WORLD); } else { while (1) { MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD); fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos); ++pos; done = true; for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) { if (0 != msg[i]) { done = false; break; } } if (done) { break; } } fprintf(stderr, "Rank %d: recv done\n", rank); MPI_Barrier(MPI_COMM_WORLD); } fprintf(stderr, "Rank %d has completed bcast\n", rank); MPI_Finalize(); return 0; } On Aug 22, 2016, at 3:40 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: This might be a thin argument but we have many users running mpirun in this way for years with no problem until this recent upgrade. And some home-brewed mpi codes do not even have a standard way to read the input files. Last time I checked, the openmpi manual still claims it supports stdin (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14). Maybe I missed it but the v2.0 release notes did not mention any changes to the behaviors of stdin as well. We can tell our users to run mpirun in the suggested way, but I do hope someone can look into the issue and fix it. Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Monday, August 22, 2016 3:04:50 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Well, I can try to find time to take a look. However, I will reiterate what Jeff H said - it is very unwise to rely on IO forwarding. Much better to just directly read the file unless that file is simply unavailable on the node where rank=0 is running. On Aug 22, 2016, at 1:55 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: Here you can find the source code for lammps input https://github.com/lammps/lammps/blob/r13864/src/input.cpp Based on the gdb output, rank 0 stuck at line 167 if (fgets(&line[m],maxline-m,infile) == NULL) and the rest threads stuck at line 203 MPI_Bcast(&n,1,MPI_INT,0,world); So rank 0 possibly hangs on the fgets() function. Here are the whole backtrace information: $ cat master.backtrace worker.backtrace #0 0x0000003c37cdb68d in read () from /lib64/libc.so.6 #1 0x0000003c37c71ca8 in _IO_new_file_underflow () from /lib64/libc.so.6 #2 0x0000003c37c737ae in _IO_default_uflow_internal () from /lib64/libc.so.6 #3 0x0000003c37c67e8a in _IO_getline_info_internal () from /lib64/libc.so.6 #4 0x0000003c37c66ce9 in fgets () from /lib64/libc.so.6 #5 0x00000000005c5a43 in LAMMPS_NS::Input::file() () at ../input.cpp:167 #6 0x00000000005d4236 in main () at ../main.cpp:31 #0 0x00002b1635d2ace2 in poll_dispatch () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 #1 0x00002b1635d1fa71 in opal_libevent2022_event_base_loop () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 #2 0x00002b1635ce4634 in opal_progress () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 #3 0x00002b16351b8fad in ompi_request_default_wait () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 #4 0x00002b16351fcb40 in ompi_coll_base_bcast_intra_generic () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 #5 0x00002b16351fd0c2 in ompi_coll_base_bcast_intra_binomial () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 #6 0x00002b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so #7 0x00002b16351cb4fb in PMPI_Bcast () from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 #8 0x00000000005c5b5d in LAMMPS_NS::Input::file() () at ../input.cpp:203 #9 0x00000000005d4236 in main () at ../main.cpp:31 Thanks, Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 ________________________________ From: users <users-boun...@lists.open-mpi.org<mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org<mailto:r...@open-mpi.org> <r...@open-mpi.org<mailto:r...@open-mpi.org>> Sent: Monday, August 22, 2016 2:17:10 PM To: Open MPI Users Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 Hmmm...perhaps we can break this out a bit? The stdin will be going to your rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast? Can you first verify that the input is being correctly delivered to rank=0? This will help us isolate if the problem is in the IO forwarding, or in the subsequent Bcast. On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu<mailto:zh...@unl.edu>> wrote: Hi all, We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of them have odd behaviors when trying to read from standard input. For example, if we start the application lammps across 4 nodes, each node 16 cores, connected by Intel QDR Infiniband, mpirun works fine for the 1st time, but always stuck in a few seconds thereafter. Command: mpirun ./lmp_ompi_g++ < in.snr in.snr is the Lammps input file. compiler is gcc/6.1. Instead, if we use mpirun ./lmp_ompi_g++ -in in.snr it works 100%. Some odd behaviors we gathered so far. 1. For 1 node job, stdin always works. 2. For multiple nodes, stdin works unstably when the number of cores per node are relatively small. For example, for 2/3/4 nodes, each node 8 cores, mpirun works most of the time. But for each node with >8 cores, mpirun works the 1st time, then always stuck. There seems to be a magic number when it stops working. 3. We tested Quantum Expresso with compiler intel/13 and had the same issue. We used gdb to debug and found when mpirun was stuck, the rest of the processes were all waiting on mpi broadcast from the master thread. The lammps binary, input file and gdb core files (example.tar.bz2) can be downloaded from this link https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc Extra information: 1. Job scheduler is slurm. 2. configure setup: ./configure --prefix=$PREFIX \ --with-hwloc=internal \ --enable-mpirun-prefix-by-default \ --with-slurm \ --with-verbs \ --with-psm \ --disable-openib-connectx-xrc \ --with-knem=/opt/knem-1.1.2.90mlnx1 \ --with-cma 3. openmpi-mca-params.conf file orte_hetero_nodes=1 hwloc_base_binding_policy=core rmaps_base_mapping_policy=core opal_cuda_support=0 btl_openib_use_eager_rdma=0 btl_openib_max_eager_rdma=0 btl_openib_flags=1 Thanks, Jingchao Dr. Jingchao Zhang Holland Computing Center University of Nebraska-Lincoln 402-472-6400 _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users <debug_info.txt>_______________________________________________ users mailing list users@lists.open-mpi.org<mailto:users@lists.open-mpi.org> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users