...The error that prompted me to start this thread occurred
during "make all" with 4.1.0:

.
.
.
Making all in mca/op/avx
gmake[2]: Entering directory 
`/home/maxd/XXXXXXXXXXXXXXXXXX/Build/openmpi-4.1.0_gcc-10.2.0/ompi/mca/op/avx'
  CC       op_avx_component.lo
  CC       liblocal_ops_avx_la-op_avx_functions.lo
  CCLD     liblocal_ops_avx.la
  CC       liblocal_ops_avx512_la-op_avx_functions.lo
op_avx_functions.c: In function 'ompi_op_avx_2buff_bxor_uint64_t_avx512':
op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F 
enabled changes the ABI [-Wpsabi]
  208 |             __m512i vecA =  _mm512_loadu_si512((__m512i*)in);           
\
      |                     ^~~~
op_avx_functions.c:263:5: note: in expansion of macro 'OP_AVX_AVX512_BIT_FUNC'
  263 |     OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op);                  
\
      |     ^~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:573:5: note: in expansion of macro 'OP_AVX_BIT_FUNC'
  573 |     OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor)
      |     ^~~~~~~~~~~~~~~
In file included from 
/home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
op_avx_functions.c: In function 'ompi_op_avx_2buff_max_int8_t_avx512':
/home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1:
 error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': 
target specific option mismatch
 6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
      | ^~~~~~~~~~~~~~~~~~~
op_avx_functions.c:73:13: note: called from here
   73 |             _mm512_storeu_si512((__m512*)out, res);                     
       \
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);           
       \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from 
/home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:65,
                 from op_avx_functions.c:26:
/home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1:
 error: inlining failed in call to 'always_inline' '_mm512_max_epi8': target 
specific option mismatch
 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B)
      | ^~~~~~~~~~~~~~~
op_avx_functions.c:72:27: note: called from here
   72 |             __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, 
vecB);  \
      |                           
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);           
       \
      |     ^~~~~~~~~~~~~~~~~~
op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
      |     ^~~~~~~~~~~
In file included from 
/home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
                 from op_avx_functions.c:26:
.
.
.

End result: the build failed.

My build of v4.1.x-202102090356-380ac96 threw no errors.
I will continue with an attempt to build GROMACS using
that 4.1.x snapshot.


On Thu, Feb 11, 2021 at 01:10:42PM +0000, Max R. Dechantsreiter wrote:
> I ran into a problem with 4.1.0 several weeks ago,
> and no longer recall precisely how; I am now rebuilding
> both 4.1.0 and a recent 4.1.x, then will use them to
> build GROMACS, probably the application I was attemping
> back then.
> 
> But I do have this from my notes (for 4.1.0):
> 
> mpicc -fopenmp hybrid_hello.c
> export OMP_NUM_THREADS=2
> mpirun -np 2 ./a.out
> # [server.clearlight.us:18349] mca_base_component_repository_open: unable to 
> open mca_op_avx: 
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
> # [server.clearlight.us:18348] mca_base_component_repository_open: unable to 
> open mca_op_avx: 
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
> # Hello from thread 0 out of 2 from process 0 out of 2 on server.clearlight.us
> # Hello from thread 1 out of 2 from process 0 out of 2 on server.clearlight.us
> # Hello from thread 0 out of 2 from process 1 out of 2 on server.clearlight.us
> # Hello from thread 1 out of 2 from process 1 out of 2 on server.clearlight.us
> 
> (where I X-ed out confidential details).  Not an error,
> but surely indicative of something amiss.
> 
> More to come!
> 
> 
> On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via users 
> wrote:
> > I think Max did try the latest 4.1 nightly build (from an off-list email), 
> > and his problem still persisted.
> > 
> > Max: can you describe exactly how Open MPI failed?  All you said was:
> > 
> > >> Consequently AVX512 intrinsic functions were erroneously
> > >> deployed, resulting in OpenMPI failure.
> > 
> > Can you provide more details?
> > 
> > 
> > > On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users 
> > > <users@lists.open-mpi.org> wrote:
> > > 
> > > Max,
> > > 
> > > at configure time, Open MPI detects the *compiler* capabilities.
> > > In your case, your compiler can emit AVX512 code.
> > > (and fwiw, the tests are only compiled and never executed)
> > > 
> > > Then at *runtime*, Open MPI detects the *CPU* capabilities.
> > > In your case, it should not invoke the functions containing AVX512 code.
> > > 
> > > That being said, several changes were made to the op/avx component,
> > > so if you are experiencing some crashes, I do invite you to give a try to 
> > > the
> > > latest nightly snapshot for the v4.1.x branch.
> > > 
> > > 
> > > Cheers,
> > > 
> > > Gilles
> > > 
> > > On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
> > > <users@lists.open-mpi.org> wrote:
> > >> 
> > >> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
> > >> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
> > >> that supports AVX2 but not AVX512, resulted in
> > >> 
> > >> checking for AVX512 support (no additional flags)... no
> > >> checking for AVX512 support (with -march=skylake-avx512)... yes
> > >> 
> > >> in "configure" output, and in config.log
> > >> 
> > >> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
> > >> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
> > >> 
> > >> Consequently AVX512 intrinsic functions were erroneously
> > >> deployed, resulting in OpenMPI failure.
> > >> 
> > >> The relevant test code was in essence
> > >> 
> > >> cat > conftest.c << EOF
> > >> #include <immintrin.h>
> > >> 
> > >> int main()
> > >> {
> > >>        __m512 vA, vB;
> > >> 
> > >>        _mm512_add_ps(vA, vB);
> > >> 
> > >>        return 0;
> > >> }
> > >> EOF
> > >> 
> > >> The problem with this is that the result of the function
> > >> is never used, so at optimization level higher than O0
> > >> the compiler elimates the function as "dead code" (DCE).
> > >> To wit,
> > >> 
> > >> gcc -O3 -march=skylake-avx512 -S conftest.c
> > >> 
> > >> yields
> > >> 
> > >>        .file   "conftest.c"
> > >>        .text
> > >>        .section        .text.startup,"ax",@progbits
> > >>        .p2align 4
> > >>        .globl  main
> > >>        .type   main, @function
> > >> main:
> > >> .LFB5345:
> > >>        .cfi_startproc
> > >>        xorl    %eax, %eax
> > >>        ret
> > >>        .cfi_endproc
> > >> .LFE5345:
> > >>        .size   main, .-main
> > >>        .ident  "GCC: (GNU) 10.2.0"
> > >>        .section        .note.GNU-stack,"",@progbits
> > >> 
> > >> Compare this with the result of
> > >> 
> > >> gcc -O0 -march=skylake-avx512 -S conftest.c
> > >> 
> > >> in which the function IS called:
> > >> 
> > >>        .file   "conftest.c"
> > >>        .text
> > >>        .globl  main
> > >>        .type   main, @function
> > >> main:
> > >> .LFB4092:
> > >>        .cfi_startproc
> > >>        pushq   %rbp
> > >>        .cfi_def_cfa_offset 16
> > >>        .cfi_offset 6, -16
> > >>        movq    %rsp, %rbp
> > >>        .cfi_def_cfa_register 6
> > >>        andq    $-64, %rsp
> > >>        subq    $136, %rsp
> > >>        vmovaps 72(%rsp), %zmm0
> > >>        vmovaps %zmm0, -56(%rsp)
> > >>        vmovaps 8(%rsp), %zmm0
> > >>        vmovaps %zmm0, -120(%rsp)
> > >>        movl    $0, %eax
> > >>        leave
> > >>        .cfi_def_cfa 7, 8
> > >>        ret
> > >>        .cfi_endproc
> > >> .LFE4092:
> > >>        .size   main, .-main
> > >>        .ident  "GCC: (GNU) 10.2.0"
> > >>        .section        .note.GNU-stack,"",@progbits
> > >> 
> > >> Note the use of a 512-bit ZMM register - ZMM registers
> > >> are used only by AVX512 instructions.  Hence at O3 the
> > >> test program does not detect the lack of AVX512 support
> > >> by the host processor.
> > >> 
> > >> An easy remedy would be to declare the operands as
> > >> "volatile" and thereby force to compiler to invoke the
> > >> function:
> > >> 
> > >> cat > conftest.c << EOF
> > >> #include <immintrin.h>
> > >> 
> > >> int main()
> > >> {
> > >>        volatile __m512 vA, vB;
> > >> 
> > >>        _mm512_add_ps(vA, vB);
> > >> 
> > >>        return 0;
> > >> }
> > >> 
> > >> Compiled at O3, the resulting executable dumps core as it
> > >> should when run on my Haswell processor, returning nonzero
> > >> exit status ($?), which would inform "configure" that the
> > >> processor does not have AVX512 capability.
> > >> 
> > >> Finally please note that this error could affect the
> > >> detection of support for other instruction sets on other
> > >> families of processors: compiler optimization must be
> > >> inhibited for such tests to be reliable!
> > >> 
> > >> Max
> > >> ---
> > >> Max R. Dechantsreiter
> > >> President
> > >> Performance Jones L.L.C.
> > >> m...@performancejones.com
> > >> Skype: PerformanceJones (UTC+01:00)
> > >> +1 414 446-3100 (telephone/voicemail)
> > >> http://www.linkedin.com/in/benchmarking
> > 
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > 

Reply via email to