??? Weird - can you send me an updated output of that last test we ran?

> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang <zh...@unl.edu> wrote:
> 
> Hi Ralph,
> 
> I saw the pull request and did a test with v2.0.1rc1, but the problem 
> persists. Any ideas?
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users <users-boun...@lists.open-mpi.org 
> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
> Sent: Wednesday, August 24, 2016 1:27:28 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Bingo - found it, fix submitted and hope to get it into 2.0.1
> 
> Thanks for the assist!
> Ralph
> 
> 
>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang <zh...@unl.edu 
>> <mailto:zh...@unl.edu>> wrote:
>> 
>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>> some useful information.
>> 
>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
>> same node.
>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
>> debug_info.txt
>> 
>> The debug_info.txt is attached. 
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users <users-boun...@lists.open-mpi.org 
>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
>> I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
>> 
>> Can you configure OMPI with --enable-debug, and then run the test again with 
>> --mca iof_base_verbose 100? It will hopefully tell us something about why 
>> the IO subsystem is stuck.
>> 
>> 
>>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang <zh...@unl.edu 
>>> <mailto:zh...@unl.edu>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> For our tests, rank 0 is always on the same node with mpirun. I just tested 
>>> mpirun with -nolocal and it still hangs. 
>>> 
>>> Information on shell and OS
>>> $ echo $0
>>> -bash
>>> 
>>> $ lsb_release -a
>>> LSB Version:    
>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>> Distributor ID: Scientific
>>> Description:    Scientific Linux release 6.8 (Carbon)
>>> Release:        6.8
>>> Codename:       Carbon
>>> 
>>> $ uname -a
>>> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
>>> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
>>> x86_64 GNU/Linux
>>> 
>>> 
>>> Dr. Jingchao Zhang
>>> Holland Computing Center
>>> University of Nebraska-Lincoln
>>> 402-472-6400
>>> From: users <users-boun...@lists.open-mpi.org 
>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>>
>>> Sent: Tuesday, August 23, 2016 8:14:48 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>  
>>> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node 
>>> on my cluster. I’ll give it a try.
>>> 
>>> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
>>> 
>>> 
>>>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet <gil...@rist.or.jp 
>>>> <mailto:gil...@rist.or.jp>> wrote:
>>>> 
>>>> Ralph,
>>>> 
>>>> did you run task 0 and mpirun on different nodes ?
>>>> 
>>>> i observed some random hangs, though i cannot blame openmpi 100% yet
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>>>> Very strange. I cannot reproduce it as I’m able to run any number of 
>>>>> nodes and procs, pushing over 100Mbytes thru without any problem.
>>>>> 
>>>>> Which leads me to suspect that the issue here is with the tty interface. 
>>>>> Can you tell me what shell and OS you are running?
>>>>> 
>>>>> 
>>>>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>> 
>>>>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>>>>>> each node, I got the following
>>>>>> 
>>>>>> $ mpirun ./a.out < test.in
>>>>>> Rank 2 has cleared MPI_Init
>>>>>> Rank 4 has cleared MPI_Init
>>>>>> Rank 7 has cleared MPI_Init
>>>>>> Rank 8 has cleared MPI_Init
>>>>>> Rank 0 has cleared MPI_Init
>>>>>> Rank 5 has cleared MPI_Init
>>>>>> Rank 6 has cleared MPI_Init
>>>>>> Rank 9 has cleared MPI_Init
>>>>>> Rank 1 has cleared MPI_Init
>>>>>> Rank 16 has cleared MPI_Init
>>>>>> Rank 19 has cleared MPI_Init
>>>>>> Rank 10 has cleared MPI_Init
>>>>>> Rank 11 has cleared MPI_Init
>>>>>> Rank 12 has cleared MPI_Init
>>>>>> Rank 13 has cleared MPI_Init
>>>>>> Rank 14 has cleared MPI_Init
>>>>>> Rank 15 has cleared MPI_Init
>>>>>> Rank 17 has cleared MPI_Init
>>>>>> Rank 18 has cleared MPI_Init
>>>>>> Rank 3 has cleared MPI_Init
>>>>>> 
>>>>>> then it just hanged.
>>>>>> 
>>>>>> --Jingchao
>>>>>> 
>>>>>> Dr. Jingchao Zhang
>>>>>> Holland Computing Center
>>>>>> University of Nebraska-Lincoln
>>>>>> 402-472-6400
>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>> <mailto:r...@open-mpi.org>>
>>>>>> Sent: Tuesday, August 23, 2016 4:03:07 PM
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>  
>>>>>> The IO forwarding messages all flow over the Ethernet, so the type of 
>>>>>> fabric is irrelevant. The number of procs involved would definitely have 
>>>>>> an impact, but that might not be due to the IO forwarding subsystem. We 
>>>>>> know we have flow control issues with collectives like Bcast that don’t 
>>>>>> have built-in synchronization points. How many reads were you able to do 
>>>>>> before it hung?
>>>>>> 
>>>>>> I was running it on my little test setup (2 nodes, using only a few 
>>>>>> procs), but I’ll try scaling up and see what happens. I’ll also try 
>>>>>> introducing some forced “syncs” on the Bcast and see if that solves the 
>>>>>> issue.
>>>>>> 
>>>>>> Ralph
>>>>>> 
>>>>>>> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>> 
>>>>>>> Hi Ralph,
>>>>>>> 
>>>>>>> I tested v2.0.1rc1 with your code but has the same issue. I also 
>>>>>>> installed v2.0.1rc1 on a different cluster which has Mellanox QDR 
>>>>>>> Infiniband and get the same result. For the tests you have done, how 
>>>>>>> many cores and nodes did you use? I can trigger the problem by using 
>>>>>>> multiple nodes and each node with more than 10 cores. 
>>>>>>> 
>>>>>>> Thank you for looking into this.
>>>>>>> 
>>>>>>> Dr. Jingchao Zhang
>>>>>>> Holland Computing Center
>>>>>>> University of Nebraska-Lincoln
>>>>>>> 402-472-6400
>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>> Sent: Monday, August 22, 2016 10:23:42 PM
>>>>>>> To: Open MPI Users
>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>  
>>>>>>> FWIW: I just tested forwarding up to 100MBytes via stdin using the 
>>>>>>> simple test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d 
>>>>>>> suggest upgrading when the official release comes out, or going ahead 
>>>>>>> and at least testing 2.0.1rc1 on your machine. Or you can test this 
>>>>>>> program with some input file and let me know if it works for you.
>>>>>>> 
>>>>>>> Ralph
>>>>>>> 
>>>>>>> #include <stdlib.h>
>>>>>>> #include <stdio.h>
>>>>>>> #include <string.h>
>>>>>>> #include <stdbool.h>
>>>>>>> #include <unistd.h>
>>>>>>> #include <mpi.h>
>>>>>>> 
>>>>>>> #define ORTE_IOF_BASE_MSG_MAX   2048
>>>>>>> 
>>>>>>> int main(int argc, char *argv[])
>>>>>>> {
>>>>>>>     int i, rank, size, next, prev, tag = 201;
>>>>>>>     int pos, msgsize, nbytes;
>>>>>>>     bool done;
>>>>>>>     char *msg;
>>>>>>> 
>>>>>>>     MPI_Init(&argc, &argv);
>>>>>>>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>>     MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>> 
>>>>>>>     fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);
>>>>>>> 
>>>>>>>     next = (rank + 1) % size;
>>>>>>>     prev = (rank + size - 1) % size;
>>>>>>>     msg = malloc(ORTE_IOF_BASE_MSG_MAX);
>>>>>>>     pos = 0;
>>>>>>>     nbytes = 0;
>>>>>>> 
>>>>>>>     if (0 == rank) {
>>>>>>>         while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
>>>>>>>             fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
>>>>>>>             if (msgsize > 0) {
>>>>>>>                 MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>> MPI_COMM_WORLD);
>>>>>>>             }
>>>>>>>             ++pos;
>>>>>>>             nbytes += msgsize;
>>>>>>>         }
>>>>>>>         fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, 
>>>>>>> pos);
>>>>>>>         memset(msg, 0, ORTE_IOF_BASE_MSG_MAX);
>>>>>>>         MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>> MPI_COMM_WORLD);
>>>>>>>         MPI_Barrier(MPI_COMM_WORLD);
>>>>>>>     } else {
>>>>>>>         while (1) {
>>>>>>>             MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>>>>>>> MPI_COMM_WORLD);
>>>>>>>             fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos);
>>>>>>>             ++pos;
>>>>>>>             done = true;
>>>>>>>             for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) {
>>>>>>>                 if (0 != msg[i]) {
>>>>>>>                     done = false;
>>>>>>>                     break;
>>>>>>>                 }
>>>>>>>             }
>>>>>>>             if (done) {
>>>>>>>                 break;
>>>>>>>             }
>>>>>>>         }
>>>>>>>         fprintf(stderr, "Rank %d: recv done\n", rank);
>>>>>>>         MPI_Barrier(MPI_COMM_WORLD);
>>>>>>>     }
>>>>>>> 
>>>>>>>     fprintf(stderr, "Rank %d has completed bcast\n", rank);
>>>>>>>     MPI_Finalize();
>>>>>>>     return 0;
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>> 
>>>>>>>> This might be a thin argument but we have many users running mpirun in 
>>>>>>>> this way for years with no problem until this recent upgrade. And some 
>>>>>>>> home-brewed mpi codes do not even have a standard way to read the 
>>>>>>>> input files. Last time I checked, the openmpi manual still claims it 
>>>>>>>> supports stdin 
>>>>>>>> (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 
>>>>>>>> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). Maybe I 
>>>>>>>> missed it but the v2.0 release notes did not mention any changes to 
>>>>>>>> the behaviors of stdin as well.
>>>>>>>> 
>>>>>>>> We can tell our users to run mpirun in the suggested way, but I do 
>>>>>>>> hope someone can look into the issue and fix it.
>>>>>>>> 
>>>>>>>> Dr. Jingchao Zhang
>>>>>>>> Holland Computing Center
>>>>>>>> University of Nebraska-Lincoln
>>>>>>>> 402-472-6400
>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>> Sent: Monday, August 22, 2016 3:04:50 PM
>>>>>>>> To: Open MPI Users
>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>  
>>>>>>>> Well, I can try to find time to take a look. However, I will reiterate 
>>>>>>>> what Jeff H said - it is very unwise to rely on IO forwarding. Much 
>>>>>>>> better to just directly read the file unless that file is simply 
>>>>>>>> unavailable on the node where rank=0 is running.
>>>>>>>> 
>>>>>>>>> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>> 
>>>>>>>>> Here you can find the source code for lammps input 
>>>>>>>>> https://github.com/lammps/lammps/blob/r13864/src/input.cpp 
>>>>>>>>> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp>
>>>>>>>>> Based on the gdb output, rank 0 stuck at line 167
>>>>>>>>> if
>>>>>>>>>  (fgets(&line[m],maxline-m,infile)
>>>>>>>>>  == NULL)
>>>>>>>>> and the rest threads stuck at line 203
>>>>>>>>> MPI_Bcast(&n,1,MPI_INT,0,world);
>>>>>>>>> 
>>>>>>>>> So rank 0 possibly hangs on the fgets() function.
>>>>>>>>> 
>>>>>>>>> Here are the whole backtrace information:
>>>>>>>>> $ cat master.backtrace worker.backtrace
>>>>>>>>> #0  0x0000003c37cdb68d in read () from /lib64/libc.so.6
>>>>>>>>> #1  0x0000003c37c71ca8 in _IO_new_file_underflow () from 
>>>>>>>>> /lib64/libc.so.6
>>>>>>>>> #2  0x0000003c37c737ae in _IO_default_uflow_internal () from 
>>>>>>>>> /lib64/libc.so.6
>>>>>>>>> #3  0x0000003c37c67e8a in _IO_getline_info_internal () from 
>>>>>>>>> /lib64/libc.so.6
>>>>>>>>> #4  0x0000003c37c66ce9 in fgets () from /lib64/libc.so.6
>>>>>>>>> #5  0x00000000005c5a43 in LAMMPS_NS::Input::file() () at 
>>>>>>>>> ../input.cpp:167
>>>>>>>>> #6  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>>>> #0  0x00002b1635d2ace2 in poll_dispatch () from 
>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>> #1  0x00002b1635d1fa71 in opal_libevent2022_event_base_loop ()
>>>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>> #2  0x00002b1635ce4634 in opal_progress () from 
>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
>>>>>>>>> #3  0x00002b16351b8fad in ompi_request_default_wait () from 
>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>> #4  0x00002b16351fcb40 in ompi_coll_base_bcast_intra_generic ()
>>>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>> #5  0x00002b16351fd0c2 in ompi_coll_base_bcast_intra_binomial ()
>>>>>>>>>    from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>> #6  0x00002b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed ()
>>>>>>>>>    from 
>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so
>>>>>>>>> #7  0x00002b16351cb4fb in PMPI_Bcast () from 
>>>>>>>>> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
>>>>>>>>> #8  0x00000000005c5b5d in LAMMPS_NS::Input::file() () at 
>>>>>>>>> ../input.cpp:203
>>>>>>>>> #9  0x00000000005d4236 in main () at ../main.cpp:31
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>> Holland Computing Center
>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>> 402-472-6400
>>>>>>>>> From: users <users-boun...@lists.open-mpi.org 
>>>>>>>>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of 
>>>>>>>>> r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org 
>>>>>>>>> <mailto:r...@open-mpi.org>>
>>>>>>>>> Sent: Monday, August 22, 2016 2:17:10 PM
>>>>>>>>> To: Open MPI Users
>>>>>>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>>>>>>  
>>>>>>>>> Hmmm...perhaps we can break this out a bit? The stdin will be going 
>>>>>>>>> to your rank=0 proc. It sounds like you have some subsequent step 
>>>>>>>>> that calls MPI_Bcast?
>>>>>>>>> 
>>>>>>>>> Can you first verify that the input is being correctly delivered to 
>>>>>>>>> rank=0? This will help us isolate if the problem is in the IO 
>>>>>>>>> forwarding, or in the subsequent Bcast.
>>>>>>>>> 
>>>>>>>>>> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang <zh...@unl.edu 
>>>>>>>>>> <mailto:zh...@unl.edu>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi all,
>>>>>>>>>> 
>>>>>>>>>> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of 
>>>>>>>>>> them have odd behaviors when trying to read from standard input.
>>>>>>>>>> 
>>>>>>>>>> For example, if we start the application lammps across 4 nodes, each 
>>>>>>>>>> node 16 cores, connected by Intel QDR Infiniband, mpirun works fine 
>>>>>>>>>> for the 1st time, but always stuck in a few seconds thereafter.
>>>>>>>>>> Command:
>>>>>>>>>> mpirun ./lmp_ompi_g++ < in.snr
>>>>>>>>>> in.snr is the Lammps input file. compiler is gcc/6.1.
>>>>>>>>>> 
>>>>>>>>>> Instead, if we use
>>>>>>>>>> mpirun ./lmp_ompi_g++ -in in.snr
>>>>>>>>>> it works 100%.
>>>>>>>>>> 
>>>>>>>>>> Some odd behaviors we gathered so far. 
>>>>>>>>>> 1. For 1 node job, stdin always works.
>>>>>>>>>> 2. For multiple nodes, stdin works unstably when the number of cores 
>>>>>>>>>> per node are relatively small. For example, for 2/3/4 nodes, each 
>>>>>>>>>> node 8 cores, mpirun works most of the time. But for each node with 
>>>>>>>>>> >8 cores, mpirun works the 1st time, then always stuck. There seems 
>>>>>>>>>> to be a magic number when it stops working.
>>>>>>>>>> 3. We tested Quantum Expresso with compiler intel/13 and had the 
>>>>>>>>>> same issue. 
>>>>>>>>>> 
>>>>>>>>>> We used gdb to debug and found when mpirun was stuck, the rest of 
>>>>>>>>>> the processes were all waiting on mpi broadcast from the master 
>>>>>>>>>> thread. The lammps binary, input file and gdb core files 
>>>>>>>>>> (example.tar.bz2) can be downloaded from this link 
>>>>>>>>>> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc 
>>>>>>>>>> <https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc>
>>>>>>>>>> 
>>>>>>>>>> Extra information:
>>>>>>>>>> 1. Job scheduler is slurm.
>>>>>>>>>> 2. configure setup:
>>>>>>>>>> ./configure     --prefix=$PREFIX \
>>>>>>>>>>                 --with-hwloc=internal \
>>>>>>>>>>                 --enable-mpirun-prefix-by-default \
>>>>>>>>>>                 --with-slurm \
>>>>>>>>>>                 --with-verbs \
>>>>>>>>>>                 --with-psm \
>>>>>>>>>>                 --disable-openib-connectx-xrc \
>>>>>>>>>>                 --with-knem=/opt/knem-1.1.2.90mlnx1 \
>>>>>>>>>>                 --with-cma
>>>>>>>>>> 3. openmpi-mca-params.conf file 
>>>>>>>>>> orte_hetero_nodes=1
>>>>>>>>>> hwloc_base_binding_policy=core
>>>>>>>>>> rmaps_base_mapping_policy=core
>>>>>>>>>> opal_cuda_support=0
>>>>>>>>>> btl_openib_use_eager_rdma=0
>>>>>>>>>> btl_openib_max_eager_rdma=0
>>>>>>>>>> btl_openib_flags=1
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Jingchao 
>>>>>>>>>> 
>>>>>>>>>> Dr. Jingchao Zhang
>>>>>>>>>> Holland Computing Center
>>>>>>>>>> University of Nebraska-Lincoln
>>>>>>>>>> 402-472-6400
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>> <debug_info.txt>_______________________________________________
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to