Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
Can you configure OMPI with --enable-debug, and then run the test again with --mca iof_base_verbose 100? It will hopefully tell us something about why the IO subsystem is stuck. > On Aug 24, 2016, at 8:46 AM, Jingchao Zhang <zh...@unl.edu> wrote: > > Hi Ralph, > > For our tests, rank 0 is always on the same node with mpirun. I just tested > mpirun with -nolocal and it still hangs. > > Information on shell and OS > $ echo $0 > -bash > > $ lsb_release -a > LSB Version: > :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch > Distributor ID: Scientific > Description: Scientific Linux release 6.8 (Carbon) > Release: 6.8 > Codename: Carbon > > $ uname -a > Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> > 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 > x86_64 GNU/Linux > > > Dr. Jingchao Zhang > Holland Computing Center > University of Nebraska-Lincoln > 402-472-6400 > From: users <users-boun...@lists.open-mpi.org > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org > <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> > Sent: Tuesday, August 23, 2016 8:14:48 PM > To: Open MPI Users > Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 > > Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on > my cluster. I’ll give it a try. > > Jingchao: is rank 0 on the node with mpirun, or on a remote node? > > >> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet <gil...@rist.or.jp >> <mailto:gil...@rist.or.jp>> wrote: >> >> Ralph, >> >> did you run task 0 and mpirun on different nodes ? >> >> i observed some random hangs, though i cannot blame openmpi 100% yet >> >> Cheers, >> >> Gilles >> >> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote: >>> Very strange. I cannot reproduce it as I’m able to run any number of nodes >>> and procs, pushing over 100Mbytes thru without any problem. >>> >>> Which leads me to suspect that the issue here is with the tty interface. >>> Can you tell me what shell and OS you are running? >>> >>> >>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang <zh...@unl.edu >>>> <mailto:zh...@unl.edu>> wrote: >>>> >>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores >>>> each node, I got the following >>>> >>>> $ mpirun ./a.out < test.in >>>> Rank 2 has cleared MPI_Init >>>> Rank 4 has cleared MPI_Init >>>> Rank 7 has cleared MPI_Init >>>> Rank 8 has cleared MPI_Init >>>> Rank 0 has cleared MPI_Init >>>> Rank 5 has cleared MPI_Init >>>> Rank 6 has cleared MPI_Init >>>> Rank 9 has cleared MPI_Init >>>> Rank 1 has cleared MPI_Init >>>> Rank 16 has cleared MPI_Init >>>> Rank 19 has cleared MPI_Init >>>> Rank 10 has cleared MPI_Init >>>> Rank 11 has cleared MPI_Init >>>> Rank 12 has cleared MPI_Init >>>> Rank 13 has cleared MPI_Init >>>> Rank 14 has cleared MPI_Init >>>> Rank 15 has cleared MPI_Init >>>> Rank 17 has cleared MPI_Init >>>> Rank 18 has cleared MPI_Init >>>> Rank 3 has cleared MPI_Init >>>> >>>> then it just hanged. >>>> >>>> --Jingchao >>>> >>>> Dr. Jingchao Zhang >>>> Holland Computing Center >>>> University of Nebraska-Lincoln >>>> 402-472-6400 >>>> From: users <users-boun...@lists.open-mpi.org >>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org >>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> >>>> Sent: Tuesday, August 23, 2016 4:03:07 PM >>>> To: Open MPI Users >>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 >>>> >>>> The IO forwarding messages all flow over the Ethernet, so the type of >>>> fabric is irrelevant. The number of procs involved would definitely have >>>> an impact, but that might not be due to the IO forwarding subsystem. We >>>> know we have flow control issues with collectives like Bcast that don’t >>>> have built-in synchronization points. How many reads were you able to do >>>> before it hung? >>>> >>>> I was running it on my little test setup (2 nodes, using only a few >>>> procs), but I’ll try scaling up and see what happens. I’ll also try >>>> introducing some forced “syncs” on the Bcast and see if that solves the >>>> issue. >>>> >>>> Ralph >>>> >>>>> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang <zh...@unl.edu >>>>> <mailto:zh...@unl.edu>> wrote: >>>>> >>>>> Hi Ralph, >>>>> >>>>> I tested v2.0.1rc1 with your code but has the same issue. I also >>>>> installed v2.0.1rc1 on a different cluster which has Mellanox QDR >>>>> Infiniband and get the same result. For the tests you have done, how many >>>>> cores and nodes did you use? I can trigger the problem by using multiple >>>>> nodes and each node with more than 10 cores. >>>>> >>>>> Thank you for looking into this. >>>>> >>>>> Dr. Jingchao Zhang >>>>> Holland Computing Center >>>>> University of Nebraska-Lincoln >>>>> 402-472-6400 >>>>> From: users <users-boun...@lists.open-mpi.org >>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org >>>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> >>>>> Sent: Monday, August 22, 2016 10:23:42 PM >>>>> To: Open MPI Users >>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 >>>>> >>>>> FWIW: I just tested forwarding up to 100MBytes via stdin using the simple >>>>> test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest >>>>> upgrading when the official release comes out, or going ahead and at >>>>> least testing 2.0.1rc1 on your machine. Or you can test this program with >>>>> some input file and let me know if it works for you. >>>>> >>>>> Ralph >>>>> >>>>> #include <stdlib.h> >>>>> #include <stdio.h> >>>>> #include <string.h> >>>>> #include <stdbool.h> >>>>> #include <unistd.h> >>>>> #include <mpi.h> >>>>> >>>>> #define ORTE_IOF_BASE_MSG_MAX 2048 >>>>> >>>>> int main(int argc, char *argv[]) >>>>> { >>>>> int i, rank, size, next, prev, tag = 201; >>>>> int pos, msgsize, nbytes; >>>>> bool done; >>>>> char *msg; >>>>> >>>>> MPI_Init(&argc, &argv); >>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >>>>> MPI_Comm_size(MPI_COMM_WORLD, &size); >>>>> >>>>> fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank); >>>>> >>>>> next = (rank + 1) % size; >>>>> prev = (rank + size - 1) % size; >>>>> msg = malloc(ORTE_IOF_BASE_MSG_MAX); >>>>> pos = 0; >>>>> nbytes = 0; >>>>> >>>>> if (0 == rank) { >>>>> while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) { >>>>> fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos); >>>>> if (msgsize > 0) { >>>>> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, >>>>> MPI_COMM_WORLD); >>>>> } >>>>> ++pos; >>>>> nbytes += msgsize; >>>>> } >>>>> fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, >>>>> pos); >>>>> memset(msg, 0, ORTE_IOF_BASE_MSG_MAX); >>>>> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, >>>>> MPI_COMM_WORLD); >>>>> MPI_Barrier(MPI_COMM_WORLD); >>>>> } else { >>>>> while (1) { >>>>> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, >>>>> MPI_COMM_WORLD); >>>>> fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos); >>>>> ++pos; >>>>> done = true; >>>>> for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) { >>>>> if (0 != msg[i]) { >>>>> done = false; >>>>> break; >>>>> } >>>>> } >>>>> if (done) { >>>>> break; >>>>> } >>>>> } >>>>> fprintf(stderr, "Rank %d: recv done\n", rank); >>>>> MPI_Barrier(MPI_COMM_WORLD); >>>>> } >>>>> >>>>> fprintf(stderr, "Rank %d has completed bcast\n", rank); >>>>> MPI_Finalize(); >>>>> return 0; >>>>> } >>>>> >>>>> >>>>> >>>>>> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang <zh...@unl.edu >>>>>> <mailto:zh...@unl.edu>> wrote: >>>>>> >>>>>> This might be a thin argument but we have many users running mpirun in >>>>>> this way for years with no problem until this recent upgrade. And some >>>>>> home-brewed mpi codes do not even have a standard way to read the input >>>>>> files. Last time I checked, the openmpi manual still claims it supports >>>>>> stdin (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 >>>>>> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). Maybe I >>>>>> missed it but the v2.0 release notes did not mention any changes to the >>>>>> behaviors of stdin as well. >>>>>> >>>>>> We can tell our users to run mpirun in the suggested way, but I do hope >>>>>> someone can look into the issue and fix it. >>>>>> >>>>>> Dr. Jingchao Zhang >>>>>> Holland Computing Center >>>>>> University of Nebraska-Lincoln >>>>>> 402-472-6400 >>>>>> From: users <users-boun...@lists.open-mpi.org >>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of >>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org >>>>>> <mailto:r...@open-mpi.org>> >>>>>> Sent: Monday, August 22, 2016 3:04:50 PM >>>>>> To: Open MPI Users >>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 >>>>>> >>>>>> Well, I can try to find time to take a look. However, I will reiterate >>>>>> what Jeff H said - it is very unwise to rely on IO forwarding. Much >>>>>> better to just directly read the file unless that file is simply >>>>>> unavailable on the node where rank=0 is running. >>>>>> >>>>>>> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang <zh...@unl.edu >>>>>>> <mailto:zh...@unl.edu>> wrote: >>>>>>> >>>>>>> Here you can find the source code for lammps input >>>>>>> https://github.com/lammps/lammps/blob/r13864/src/input.cpp >>>>>>> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp> >>>>>>> Based on the gdb output, rank 0 stuck at line 167 >>>>>>> if >>>>>>> (fgets(&line[m],maxline-m,infile) >>>>>>> == NULL) >>>>>>> and the rest threads stuck at line 203 >>>>>>> MPI_Bcast(&n,1,MPI_INT,0,world); >>>>>>> >>>>>>> So rank 0 possibly hangs on the fgets() function. >>>>>>> >>>>>>> Here are the whole backtrace information: >>>>>>> $ cat master.backtrace worker.backtrace >>>>>>> #0 0x0000003c37cdb68d in read () from /lib64/libc.so.6 >>>>>>> #1 0x0000003c37c71ca8 in _IO_new_file_underflow () from >>>>>>> /lib64/libc.so.6 >>>>>>> #2 0x0000003c37c737ae in _IO_default_uflow_internal () from >>>>>>> /lib64/libc.so.6 >>>>>>> #3 0x0000003c37c67e8a in _IO_getline_info_internal () from >>>>>>> /lib64/libc.so.6 >>>>>>> #4 0x0000003c37c66ce9 in fgets () from /lib64/libc.so.6 >>>>>>> #5 0x00000000005c5a43 in LAMMPS_NS::Input::file() () at >>>>>>> ../input.cpp:167 >>>>>>> #6 0x00000000005d4236 in main () at ../main.cpp:31 >>>>>>> #0 0x00002b1635d2ace2 in poll_dispatch () from >>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 >>>>>>> #1 0x00002b1635d1fa71 in opal_libevent2022_event_base_loop () >>>>>>> from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 >>>>>>> #2 0x00002b1635ce4634 in opal_progress () from >>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20 >>>>>>> #3 0x00002b16351b8fad in ompi_request_default_wait () from >>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 >>>>>>> #4 0x00002b16351fcb40 in ompi_coll_base_bcast_intra_generic () >>>>>>> from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 >>>>>>> #5 0x00002b16351fd0c2 in ompi_coll_base_bcast_intra_binomial () >>>>>>> from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 >>>>>>> #6 0x00002b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed () >>>>>>> from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so >>>>>>> #7 0x00002b16351cb4fb in PMPI_Bcast () from >>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20 >>>>>>> #8 0x00000000005c5b5d in LAMMPS_NS::Input::file() () at >>>>>>> ../input.cpp:203 >>>>>>> #9 0x00000000005d4236 in main () at ../main.cpp:31 >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Dr. Jingchao Zhang >>>>>>> Holland Computing Center >>>>>>> University of Nebraska-Lincoln >>>>>>> 402-472-6400 >>>>>>> From: users <users-boun...@lists.open-mpi.org >>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of >>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org >>>>>>> <mailto:r...@open-mpi.org>> >>>>>>> Sent: Monday, August 22, 2016 2:17:10 PM >>>>>>> To: Open MPI Users >>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0 >>>>>>> >>>>>>> Hmmm...perhaps we can break this out a bit? The stdin will be going to >>>>>>> your rank=0 proc. It sounds like you have some subsequent step that >>>>>>> calls MPI_Bcast? >>>>>>> >>>>>>> Can you first verify that the input is being correctly delivered to >>>>>>> rank=0? This will help us isolate if the problem is in the IO >>>>>>> forwarding, or in the subsequent Bcast. >>>>>>> >>>>>>>> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu >>>>>>>> <mailto:zh...@unl.edu>> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of >>>>>>>> them have odd behaviors when trying to read from standard input. >>>>>>>> >>>>>>>> For example, if we start the application lammps across 4 nodes, each >>>>>>>> node 16 cores, connected by Intel QDR Infiniband, mpirun works fine >>>>>>>> for the 1st time, but always stuck in a few seconds thereafter. >>>>>>>> Command: >>>>>>>> mpirun ./lmp_ompi_g++ < in.snr >>>>>>>> in.snr is the Lammps input file. compiler is gcc/6.1. >>>>>>>> >>>>>>>> Instead, if we use >>>>>>>> mpirun ./lmp_ompi_g++ -in in.snr >>>>>>>> it works 100%. >>>>>>>> >>>>>>>> Some odd behaviors we gathered so far. >>>>>>>> 1. For 1 node job, stdin always works. >>>>>>>> 2. For multiple nodes, stdin works unstably when the number of cores >>>>>>>> per node are relatively small. For example, for 2/3/4 nodes, each node >>>>>>>> 8 cores, mpirun works most of the time. But for each node with >8 >>>>>>>> cores, mpirun works the 1st time, then always stuck. There seems to be >>>>>>>> a magic number when it stops working. >>>>>>>> 3. We tested Quantum Expresso with compiler intel/13 and had the same >>>>>>>> issue. >>>>>>>> >>>>>>>> We used gdb to debug and found when mpirun was stuck, the rest of the >>>>>>>> processes were all waiting on mpi broadcast from the master thread. >>>>>>>> The lammps binary, input file and gdb core files (example.tar.bz2) can >>>>>>>> be downloaded from this link >>>>>>>> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc >>>>>>>> <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc> >>>>>>>> >>>>>>>> Extra information: >>>>>>>> 1. Job scheduler is slurm. >>>>>>>> 2. configure setup: >>>>>>>> ./configure --prefix=$PREFIX \ >>>>>>>> --with-hwloc=internal \ >>>>>>>> --enable-mpirun-prefix-by-default \ >>>>>>>> --with-slurm \ >>>>>>>> --with-verbs \ >>>>>>>> --with-psm \ >>>>>>>> --disable-openib-connectx-xrc \ >>>>>>>> --with-knem=/opt/knem-1.1.2.90mlnx1 \ >>>>>>>> --with-cma >>>>>>>> 3. openmpi-mca-params.conf file >>>>>>>> orte_hetero_nodes=1 >>>>>>>> hwloc_base_binding_policy=core >>>>>>>> rmaps_base_mapping_policy=core >>>>>>>> opal_cuda_support=0 >>>>>>>> btl_openib_use_eager_rdma=0 >>>>>>>> btl_openib_max_eager_rdma=0 >>>>>>>> btl_openib_flags=1 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Jingchao >>>>>>>> >>>>>>>> Dr. Jingchao Zhang >>>>>>>> Holland Computing Center >>>>>>>> University of Nebraska-Lincoln >>>>>>>> 402-472-6400 >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users