Re: [OMPI users] stdin issue with openmpi/2.0.0

r...@open-mpi.org Wed, 24 Aug 2016 11:18:36 -0700

Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m 
also using bash, but on CentOS-7, so I suspect the OS is the difference.


Can you configure OMPI with --enable-debug, and then run the test again with 
--mca iof_base_verbose 100? It will hopefully tell us something about why the 
IO subsystem is stuck.


> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang <zh...@unl.edu> wrote:
> 
> Hi Ralph,
> 
> For our tests, rank 0 is always on the same node with mpirun. I just tested 
> mpirun with -nolocal and it still hangs. 
> 
> Information on shell and OS
> $ echo $0
> -bash
> 
> $ lsb_release -a
> LSB Version:    
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID: Scientific
> Description:    Scientific Linux release 6.8 (Carbon)
> Release:        6.8
> Codename:       Carbon
> 
> $ uname -a
> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
> x86_64 GNU/Linux
> 
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users <users-boun...@lists.open-mpi.org 
> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
> Sent: Tuesday, August 23, 2016 8:14:48 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on 
> my cluster. I’ll give it a try.
> 
> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
> 
> 
>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet <gil...@rist.or.jp 
>> <mailto:gil...@rist.or.jp>> wrote:
>> 
>> Ralph,
>> 
>> did you run task 0 and mpirun on different nodes ?
>> 
>> i observed some random hangs, though i cannot blame openmpi 100% yet
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> Very strange. I cannot reproduce it as I’m able to run any number of nodes 
>>> and procs, pushing over 100Mbytes thru without any problem.
>>> 
>>> Which leads me to suspect that the issue here is with the tty interface. 
>>> Can you tell me what shell and OS you are running?
>>> 
>>> 
>>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang <zh...@unl.edu 
>>>> <mailto:zh...@unl.edu>> wrote:
>>>> 
>>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>>>> each node, I got the following
>>>> 
>>>> $ mpirun ./a.out < test.in
>>>> Rank 2 has cleared MPI_Init
>>>> Rank 4 has cleared MPI_Init
>>>> Rank 7 has cleared MPI_Init
>>>> Rank 8 has cleared MPI_Init
>>>> Rank 0 has cleared MPI_Init
>>>> Rank 5 has cleared MPI_Init
>>>> Rank 6 has cleared MPI_Init
>>>> Rank 9 has cleared MPI_Init
>>>> Rank 1 has cleared MPI_Init
>>>> Rank 16 has cleared MPI_Init
>>>> Rank 19 has cleared MPI_Init
>>>> Rank 10 has cleared MPI_Init
>>>> Rank 11 has cleared MPI_Init
>>>> Rank 12 has cleared MPI_Init
>>>> Rank 13 has cleared MPI_Init
>>>> Rank 14 has cleared MPI_Init
>>>> Rank 15 has cleared MPI_Init
>>>> Rank 17 has cleared MPI_Init
>>>> Rank 18 has cleared MPI_Init
>>>> Rank 3 has cleared MPI_Init
>>>> 
>>>> then it just hanged.
>>>> 
>>>> --Jingchao
>>>> 
>>>> Dr. Jingchao Zhang
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> 402-472-6400
>>>> From: users <users-boun...@lists.open-mpi.org 
>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>>>> Sent: Tuesday, August 23, 2016 4:03:07 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>  
>>>> The IO forwarding messages all flow over the Ethernet, so the type of 
>>>> fabric is irrelevant. The number of procs involved would definitely have 
>>>> an impact, but that might not be due to the IO forwarding subsystem. We 
>>>> know we have flow control issues with collectives like Bcast that don’t 
>>>> have built-in synchronization points. How many reads were you able to do 
>>>> before it hung?
>>>> 
>>>> I was running it on my little test setup (2 nodes, using only a few 
>>>> procs), but I’ll try scaling up and see what happens. I’ll also try 
>>>> introducing some forced “syncs” on the Bcast and see if that solves the 
>>>> issue.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang <zh...@unl.edu 
>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>> 
>>>>> Hi Ralph,
>>>>> 
>>>>> I tested v2.0.1rc1 with your code but has the same issue. I also 
>>>>> installed v2.0.1rc1 on a different cluster which has Mellanox QDR 
>>>>> Infiniband and get the same result. For the tests you have done, how many 
>>>>> cores and nodes did you use? I can trigger the problem by using multiple 
>>>>> nodes and each node with more than 10 cores. 
>>>>> 
>>>>> Thank you for looking into this.
>>>>> 
>>>>> Dr. Jingchao Zhang
>>>>> Holland Computing Center
>>>>> University of Nebraska-Lincoln
>>>>> 402-472-6400
>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>>>>> Sent: Monday, August 22, 2016 10:23:42 PM
>>>>> To: Open MPI Users
>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>  
>>>>> FWIW: I just tested forwarding up to 100MBytes via stdin using the simple 
>>>>> test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest 
>>>>> upgrading when the official release comes out, or going ahead and at 
>>>>> least testing 2.0.1rc1 on your machine. Or you can test this program with 
>>>>> some input file and let me know if it works for you.
>>>>> 
>>>>> Ralph
>>>>> 
>>>>> #include <stdlib.h>
>>>>> #include <stdio.h>
>>>>> #include <string.h>
>>>>> #include <stdbool.h>
>>>>> #include <unistd.h>
>>>>> #include <mpi.h>
>>>>> 
>>>>> #define ORTE_IOF_BASE_MSG_MAX   2048
>>>>> 
>>>>> int main(int argc, char *argv[])
>>>>> {
>>>>>     int i, rank, size, next, prev, tag = 201;
>>>>>     int pos, msgsize, nbytes;
>>>>>     bool done;
>>>>>     char *msg;
>>>>> 
>>>>>     MPI_Init(&argc, &argv);
>>>>>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>     MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>> 
>>>>>     fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);
>>>>> 
>>>>>     next = (rank + 1) % size;
>>>>>     prev = (rank + size - 1) % size;
>>>>>     msg = malloc(ORTE_IOF_BASE_MSG_MAX);
>>>>>     pos = 0;
>>>>>     nbytes = 0;
>>>>> 
>>>>>     if (0 == rank) {
>>>>>         while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
>>>>>             fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
>>>>>             if (msgsize > 0) {
>>>>>                 MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>> MPI_COMM_WORLD);
>>>>>             }
>>>>>             ++pos;
>>>>>             nbytes += msgsize;
>>>>>         }
>>>>>         fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, 
>>>>> pos);
>>>>>         memset(msg, 0, ORTE_IOF_BASE_MSG_MAX);
>>>>>         MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>> MPI_COMM_WORLD);
>>>>>         MPI_Barrier(MPI_COMM_WORLD);
>>>>>     } else {
>>>>>         while (1) {
>>>>>             MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>> MPI_COMM_WORLD);
>>>>>             fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos);
>>>>>             ++pos;
>>>>>             done = true;
>>>>>             for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) {
>>>>>                 if (0 != msg[i]) {
>>>>>                     done = false;
>>>>>                     break;
>>>>>                 }
>>>>>             }
>>>>>             if (done) {
>>>>>                 break;
>>>>>             }
>>>>>         }
>>>>>         fprintf(stderr, "Rank %d: recv done\n", rank);
>>>>>         MPI_Barrier(MPI_COMM_WORLD);
>>>>>     }
>>>>> 
>>>>>     fprintf(stderr, "Rank %d has completed bcast\n", rank);
>>>>>     MPI_Finalize();
>>>>>     return 0;
>>>>> }
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>> 
>>>>>> This might be a thin argument but we have many users running mpirun in 
>>>>>> this way for years with no problem until this recent upgrade. And some 
>>>>>> home-brewed mpi codes do not even have a standard way to read the input 
>>>>>> files. Last time I checked, the openmpi manual still claims it supports 
>>>>>> stdin (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 
>>>>>> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). Maybe I 
>>>>>> missed it but the v2.0 release notes did not mention any changes to the 
>>>>>> behaviors of stdin as well.
>>>>>> 
>>>>>> We can tell our users to run mpirun in the suggested way, but I do hope 
>>>>>> someone can look into the issue and fix it.
>>>>>> 
>>>>>> Dr. Jingchao Zhang
>>>>>> Holland Computing Center
>>>>>> University of Nebraska-Lincoln
>>>>>> 402-472-6400
>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>> <mailto:r...@open-mpi.org>>
>>>>>> Sent: Monday, August 22, 2016 3:04:50 PM
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>  
>>>>>> Well, I can try to find time to take a look. However, I will reiterate 
>>>>>> what Jeff H said - it is very unwise to rely on IO forwarding. Much 
>>>>>> better to just directly read the file unless that file is simply 
>>>>>> unavailable on the node where rank=0 is running.
>>>>>> 
>>>>>>> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>> 
>>>>>>> Here you can find the source code for lammps input 
>>>>>>> https://github.com/lammps/lammps/blob/r13864/src/input.cpp 
>>>>>>> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp>
>>>>>>> Based on the gdb output, rank 0 stuck at line 167
>>>>>>> if
>>>>>>>  (fgets(&line[m],maxline-m,infile)
>>>>>>>  == NULL)
>>>>>>> and the rest threads stuck at line 203
>>>>>>> MPI_Bcast(&n,1,MPI_INT,0,world);
>>>>>>> 
>>>>>>> So rank 0 possibly hangs on the fgets() function.
>>>>>>> 
>>>>>>> Here are the whole backtrace information:
>>>>>>> $ cat master.backtrace worker.backtrace
>>>>>>> #0  0x0000003c37cdb68d in read () from /lib64/libc.so.6
>>>>>>> #1  0x0000003c37c71ca8 in _IO_new_file_underflow () from 
>>>>>>> /lib64/libc.so.6
>>>>>>> #2  0x0000003c37c737ae in _IO_default_uflow_internal () from 
>>>>>>> /lib64/libc.so.6
>>>>>>> #3  0x0000003c37c67e8a in _IO_getline_info_internal () from 
>>>>>>> /lib64/libc.so.6
>>>>>>> #4  0x0000003c37c66ce9 in fgets () from /lib64/libc.so.6
>>>>>>> #5  0x00000000005c5a43 in LAMMPS_NS::Input::file() () at 
>>>>>>> ../input.cpp:167
>>>>>>> #6  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>> #0  0x00002b1635d2ace2 in poll_dispatch () from 
>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>> #1  0x00002b1635d1fa71 in opal_libevent2022_event_base_loop ()
>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>> #2  0x00002b1635ce4634 in opal_progress () from 
>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>> #3  0x00002b16351b8fad in ompi_request_default_wait () from 
>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>> #4  0x00002b16351fcb40 in ompi_coll_base_bcast_intra_generic ()
>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>> #5  0x00002b16351fd0c2 in ompi_coll_base_bcast_intra_binomial ()
>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>> #6  0x00002b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed ()
>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so
>>>>>>> #7  0x00002b16351cb4fb in PMPI_Bcast () from 
>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>> #8  0x00000000005c5b5d in LAMMPS_NS::Input::file() () at 
>>>>>>> ../input.cpp:203
>>>>>>> #9  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Dr. Jingchao Zhang
>>>>>>> Holland Computing Center
>>>>>>> University of Nebraska-Lincoln
>>>>>>> 402-472-6400
>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>> Sent: Monday, August 22, 2016 2:17:10 PM
>>>>>>> To: Open MPI Users
>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>  
>>>>>>> Hmmm...perhaps we can break this out a bit? The stdin will be going to 
>>>>>>> your rank=0 proc. It sounds like you have some subsequent step that 
>>>>>>> calls MPI_Bcast?
>>>>>>> 
>>>>>>> Can you first verify that the input is being correctly delivered to 
>>>>>>> rank=0? This will help us isolate if the problem is in the IO 
>>>>>>> forwarding, or in the subsequent Bcast.
>>>>>>> 
>>>>>>>> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of 
>>>>>>>> them have odd behaviors when trying to read from standard input.
>>>>>>>> 
>>>>>>>> For example, if we start the application lammps across 4 nodes, each 
>>>>>>>> node 16 cores, connected by Intel QDR Infiniband, mpirun works fine 
>>>>>>>> for the 1st time, but always stuck in a few seconds thereafter.
>>>>>>>> Command:
>>>>>>>> mpirun ./lmp_ompi_g++ < in.snr
>>>>>>>> in.snr is the Lammps input file. compiler is gcc/6.1.
>>>>>>>> 
>>>>>>>> Instead, if we use
>>>>>>>> mpirun ./lmp_ompi_g++ -in in.snr
>>>>>>>> it works 100%.
>>>>>>>> 
>>>>>>>> Some odd behaviors we gathered so far. 
>>>>>>>> 1. For 1 node job, stdin always works.
>>>>>>>> 2. For multiple nodes, stdin works unstably when the number of cores 
>>>>>>>> per node are relatively small. For example, for 2/3/4 nodes, each node 
>>>>>>>> 8 cores, mpirun works most of the time. But for each node with >8 
>>>>>>>> cores, mpirun works the 1st time, then always stuck. There seems to be 
>>>>>>>> a magic number when it stops working.
>>>>>>>> 3. We tested Quantum Expresso with compiler intel/13 and had the same 
>>>>>>>> issue. 
>>>>>>>> 
>>>>>>>> We used gdb to debug and found when mpirun was stuck, the rest of the 
>>>>>>>> processes were all waiting on mpi broadcast from the master thread. 
>>>>>>>> The lammps binary, input file and gdb core files (example.tar.bz2) can 
>>>>>>>> be downloaded from this link 
>>>>>>>> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc 
>>>>>>>> <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc>
>>>>>>>> 
>>>>>>>> Extra information:
>>>>>>>> 1. Job scheduler is slurm.
>>>>>>>> 2. configure setup:
>>>>>>>> ./configure     --prefix=$PREFIX \
>>>>>>>>                 --with-hwloc=internal \
>>>>>>>>                 --enable-mpirun-prefix-by-default \
>>>>>>>>                 --with-slurm \
>>>>>>>>                 --with-verbs \
>>>>>>>>                 --with-psm \
>>>>>>>>                 --disable-openib-connectx-xrc \
>>>>>>>>                 --with-knem=/opt/knem-1.1.2.90mlnx1 \
>>>>>>>>                 --with-cma
>>>>>>>> 3. openmpi-mca-params.conf file 
>>>>>>>> orte_hetero_nodes=1
>>>>>>>> hwloc_base_binding_policy=core
>>>>>>>> rmaps_base_mapping_policy=core
>>>>>>>> opal_cuda_support=0
>>>>>>>> btl_openib_use_eager_rdma=0
>>>>>>>> btl_openib_max_eager_rdma=0
>>>>>>>> btl_openib_flags=1
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jingchao 
>>>>>>>> 
>>>>>>>> Dr. Jingchao Zhang
>>>>>>>> Holland Computing Center
>>>>>>>> University of Nebraska-Lincoln
>>>>>>>> 402-472-6400
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

Reply via email to