Re: [OMPI users] include/mpi.h:201:32: error: two or more data types in declaration specifiers

2020-07-23 Thread Steve Brasier via users
Giles, thank you for your answer on this - my reply seems to have got eaten
probably for being too long so I have put it up here:

https://gist.github.com/sjpb/680f3781110025ade3dff34390d673c8#file-gnu9-ompi4-em_real-log-L903


it almost looks to me like it's using the wrong compiler or something but I
can't figure out what?

many thanks

Steve Brasier

On Tue, 14 Jul 2020 at 11:18, Steve Brasier  wrote:

> I'm trying to compile WRF against openmpi4 installed using Spack, using
> gcc 7. It's hitting lots of errors of which the first appears to be in the
> mpi header:
>
> ranlib ../libio_grib1.a
> In file included from c_code.c:27:0:
> $HOME/spack/opt/spack/linux-centos7-broadwell/gcc-7.3.0/openmpi-4.0.3
> -ziwdzwh77wcddumuqk5akbmodploffo6/include/mpi.h:201:32: error:
>  two or more data types in declaration specifiers
>  #define ompi_fortran_integer_t int
> ^
> I've managed to compile other codes using this same combination. Any
> suggestions as to what I'm doing wrong?
>
> many thanks
> Steve
>
> http://stackhpc.com/
> Please note I work Tuesday to Friday.
>


Re: [OMPI users] include/mpi.h:201:32: error: two or more data types in declaration specifiers

2020-07-23 Thread Gilles Gouaillardet via users
In https://github.com/NCAR/WRFV3/blob/master/external/RSL_LITE/rsl_lite.h

#ifndef MPI2_SUPPORT
typedef int MPI_Fint;
# define MPI_Comm_c2f(comm) (MPI_Fint)(comm)
# define MPI_Comm_f2c(comm) (MPI_Comm)(comm)
#endif

so I guess the MPI2_SUPPORT macro is not defined, and it makes Open
MPI a sad panda.
I do not know how WRF is supposed to be built, but I think you have to
manually pass the
-DMPI2_SUPPORT parameter when building RSL_LITE.

Cheers,

Gilles

On Thu, Jul 23, 2020 at 7:32 PM Steve Brasier via users
 wrote:
>
> Giles, thank you for your answer on this - my reply seems to have got eaten 
> probably for being too long so I have put it up here:
>
> https://gist.github.com/sjpb/680f3781110025ade3dff34390d673c8#file-gnu9-ompi4-em_real-log-L903
>
> it almost looks to me like it's using the wrong compiler or something but I 
> can't figure out what?
>
> many thanks
>
> Steve Brasier
>
> On Tue, 14 Jul 2020 at 11:18, Steve Brasier  wrote:
>>
>> I'm trying to compile WRF against openmpi4 installed using Spack, using gcc 
>> 7. It's hitting lots of errors of which the first appears to be in the mpi 
>> header:
>>
>> ranlib ../libio_grib1.a
>> In file included from c_code.c:27:0:
>> $HOME/spack/opt/spack/linux-centos7-broadwell/gcc-7.3.0/openmpi-4.0.3-ziwdzwh77wcddumuqk5akbmodploffo6/include/mpi.h:201:32:
>>  error: two or more data types in declaration specifiers
>>  #define ompi_fortran_integer_t int
>> ^
>> I've managed to compile other codes using this same combination. Any 
>> suggestions as to what I'm doing wrong?
>>
>> many thanks
>> Steve
>>
>> http://stackhpc.com/
>> Please note I work Tuesday to Friday.


[OMPI users] MPI test suite

2020-07-23 Thread Zhang, Junchao via users
Hello,
  Does OMPI have a test suite that can let me validate MPI implementations from 
other vendors?

  Thanks
--Junchao Zhang





Re: [OMPI users] MPI test suite

2020-07-23 Thread Marco Atzeri via users

On 23.07.2020 20:28, Zhang, Junchao via users wrote:

Hello,
   Does OMPI have a test suite that can let me validate MPI 
implementations from other vendors?


   Thanks
--Junchao Zhang


Have you considered the OSU Micro-Benchmarks ?

http://mvapich.cse.ohio-state.edu/benchmarks/


[OMPI users] segfault in libibverbs.so

2020-07-23 Thread Prentice Bisbal via users
I manage a cluster that is very heterogeneous. Some nodes have 
InfiniBand, while others have 10 Gb/s Ethernet. We recently upgraded to 
CentOS 7, and built a new software stack for CentOS 7. We are using 
OpenMPI 4.0.3, and we are using Slurm 19.05.5 as our job scheduler.


We just noticed that when jobs are sent to the nodes with IB, the 
segfault immediately, with the segfault appearing to come from 
libibverbs.so. This is what I see in the stderr output for one of these 
failed jobs:


srun: error: greene021: tasks 0-3: Segmentation fault

And here is what I see in the log messages of the compute node where 
that segfault happened:


Jul 23 15:19:41 greene021 kernel: mpihello[7911]: segfault at 
7f0635f38910 ip 7f0635f49405 sp 7ffe354485a0 error 4
Jul 23 15:19:41 greene021 kernel: mpihello[7912]: segfault at 
7f23d51ea910 ip 7f23d51fb405 sp 7ffef250a9a0 error 4
Jul 23 15:19:41 greene021 kernel: in 
libibverbs.so.1.5.22.4[7f23d51ec000+18000]

Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: mpihello[7909]: segfault at 
7ff504ba5910 ip 7ff504bb6405 sp 7917ccb0 error 4
Jul 23 15:19:41 greene021 kernel: in 
libibverbs.so.1.5.22.4[7ff504ba7000+18000]

Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: mpihello[7910]: segfault at 
7fa58abc5910 ip 7fa58abd6405 sp 7ffdde50c0d0 error 4
Jul 23 15:19:41 greene021 kernel: in 
libibverbs.so.1.5.22.4[7fa58abc7000+18000]

Jul 23 15:19:41 greene021 kernel:
Jul 23 15:19:41 greene021 kernel: in 
libibverbs.so.1.5.22.4[7f0635f3a000+18000]

Jul 23 15:19:41 greene021 kernel

Any idea what is going on here, or how to debug further? I've been using 
OpenMPI for years, and it usually just works.


I normally start my job with srun like this:

srun ./mpihello

But even if I try to take IB out of the equation by starting the job 
like this:


mpirun -mca btl ^openib ./mpihello

I still get a segfault issue, although the message to stderr is now a 
little different:


--
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
--
mpirun noticed that process rank 1 with PID 8502 on node greene021 
exited on signal 11 (Segmentation fault).

--

The segfaults happens immediately. It seems to happen as soon as 
MPI_Init() is called. The program I'm running is very simple MPI "Hello 
world!" program.


The output of  ompi_info is below my signature, in case that helps.

Prentice

$ ompi_info
 Package: Open MPI u...@host.example.com Distribution
    Open MPI: 4.0.3
  Open MPI repo revision: v4.0.3
   Open MPI release date: Mar 03, 2020
    Open RTE: 4.0.3
  Open RTE repo revision: v4.0.3
   Open RTE release date: Mar 03, 2020
    OPAL: 4.0.3
  OPAL repo revision: v4.0.3
   OPAL release date: Mar 03, 2020
 MPI API: 3.1.0
    Ident string: 4.0.3
  Prefix: /usr/pppl/gcc/9.3-pkgs/openmpi-4.0.3
 Configured architecture: x86_64-unknown-linux-gnu
  Configure host: dawson027.pppl.gov
   Configured by: lglant
   Configured on: Mon Jun  1 12:37:07 EDT 2020
  Configure host: dawson027.pppl.gov
  Configure command line: '--prefix=/usr/pppl/gcc/9.3-pkgs/openmpi-4.0.3'
  '--with-ucx' '--with-verbs' '--with-libfabric'
  '--with-libevent=/usr'
'--with-libevent-libdir=/usr/lib64'
'--with-pmix=/usr/pppl/pmix/3.1.5' '--with-pmi'
    Built by: lglant
    Built on: Mon Jun  1 13:05:40 EDT 2020
  Built host: dawson027.pppl.gov
  C bindings: yes
    C++ bindings: no
 Fort mpif.h: yes (all)
    Fort use mpi: yes (full: ignore TKR)
   Fort use mpi size: deprecated-ompi-info-value
    Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
  limitations in the gfortran compiler and/or Open
  MPI, does not support the following: array
  subsections, direct passthru (where possible) to
  underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
   Java bindings: no
  Wrapper compiler rpath: runpath
  C compiler: gcc
 C compiler absolute: /usr/pppl/gcc/9.3.0/bin/gcc
  C compiler family name: GNU
  C compiler version: 9.3.0
    C++ compiler: g++
   C++ compiler absolute: /usr/pppl/gcc/9.3.0/bin/g++
   Fort compiler: gfortran
   Fort compiler abs: /usr/pppl/gcc/9.3.0/bin/gfortran
 Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::)

Re: [OMPI users] MPI test suite

2020-07-23 Thread Zhang, Junchao via users
I know OSU micro-benchmarks.  But it is not an extensive test suite.

Thanks
--Junchao Zhang



> On Jul 23, 2020, at 2:00 PM, Marco Atzeri via users 
>  wrote:
> 
> On 23.07.2020 20:28, Zhang, Junchao via users wrote:
>> Hello,
>>   Does OMPI have a test suite that can let me validate MPI implementations 
>> from other vendors?
>>   Thanks
>> --Junchao Zhang
> 
> Have you considered the OSU Micro-Benchmarks ?
> 
> http://mvapich.cse.ohio-state.edu/benchmarks/