...The error that prompted me to start this thread occurred during "make all" with 4.1.0:
. . . Making all in mca/op/avx gmake[2]: Entering directory `/home/maxd/XXXXXXXXXXXXXXXXXX/Build/openmpi-4.1.0_gcc-10.2.0/ompi/mca/op/avx' CC op_avx_component.lo CC liblocal_ops_avx_la-op_avx_functions.lo CCLD liblocal_ops_avx.la CC liblocal_ops_avx512_la-op_avx_functions.lo op_avx_functions.c: In function 'ompi_op_avx_2buff_bxor_uint64_t_avx512': op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F enabled changes the ABI [-Wpsabi] 208 | __m512i vecA = _mm512_loadu_si512((__m512i*)in); \ | ^~~~ op_avx_functions.c:263:5: note: in expansion of macro 'OP_AVX_AVX512_BIT_FUNC' 263 | OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op); \ | ^~~~~~~~~~~~~~~~~~~~~~ op_avx_functions.c:573:5: note: in expansion of macro 'OP_AVX_BIT_FUNC' 573 | OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor) | ^~~~~~~~~~~~~~~ In file included from /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55, from op_avx_functions.c:26: op_avx_functions.c: In function 'ompi_op_avx_2buff_max_int8_t_avx512': /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1: error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': target specific option mismatch 6429 | _mm512_storeu_si512 (void *__P, __m512i __A) | ^~~~~~~~~~~~~~~~~~~ op_avx_functions.c:73:13: note: called from here 73 | _mm512_storeu_si512((__m512*)out, res); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC' 124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ | ^~~~~~~~~~~~~~~~~~ op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC' 454 | OP_AVX_FUNC(max, i, 8, int8_t, max) | ^~~~~~~~~~~ In file included from /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:65, from op_avx_functions.c:26: /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1: error: inlining failed in call to 'always_inline' '_mm512_max_epi8': target specific option mismatch 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B) | ^~~~~~~~~~~~~~~ op_avx_functions.c:72:27: note: called from here 72 | __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, vecB); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC' 124 | OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op); \ | ^~~~~~~~~~~~~~~~~~ op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC' 454 | OP_AVX_FUNC(max, i, 8, int8_t, max) | ^~~~~~~~~~~ In file included from /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55, from op_avx_functions.c:26: . . . End result: the build failed. My build of v4.1.x-202102090356-380ac96 threw no errors. I will continue with an attempt to build GROMACS using that 4.1.x snapshot. On Thu, Feb 11, 2021 at 01:10:42PM +0000, Max R. Dechantsreiter wrote: > I ran into a problem with 4.1.0 several weeks ago, > and no longer recall precisely how; I am now rebuilding > both 4.1.0 and a recent 4.1.x, then will use them to > build GROMACS, probably the application I was attemping > back then. > > But I do have this from my notes (for 4.1.0): > > mpicc -fopenmp hybrid_hello.c > export OMP_NUM_THREADS=2 > mpirun -np 2 ./a.out > # [server.clearlight.us:18349] mca_base_component_repository_open: unable to > open mca_op_avx: > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: > undefined symbol: ompi_op_avx_functions_avx512 (ignored) > # [server.clearlight.us:18348] mca_base_component_repository_open: unable to > open mca_op_avx: > /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: > undefined symbol: ompi_op_avx_functions_avx512 (ignored) > # Hello from thread 0 out of 2 from process 0 out of 2 on server.clearlight.us > # Hello from thread 1 out of 2 from process 0 out of 2 on server.clearlight.us > # Hello from thread 0 out of 2 from process 1 out of 2 on server.clearlight.us > # Hello from thread 1 out of 2 from process 1 out of 2 on server.clearlight.us > > (where I X-ed out confidential details). Not an error, > but surely indicative of something amiss. > > More to come! > > > On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via users > wrote: > > I think Max did try the latest 4.1 nightly build (from an off-list email), > > and his problem still persisted. > > > > Max: can you describe exactly how Open MPI failed? All you said was: > > > > >> Consequently AVX512 intrinsic functions were erroneously > > >> deployed, resulting in OpenMPI failure. > > > > Can you provide more details? > > > > > > > On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users > > > <users@lists.open-mpi.org> wrote: > > > > > > Max, > > > > > > at configure time, Open MPI detects the *compiler* capabilities. > > > In your case, your compiler can emit AVX512 code. > > > (and fwiw, the tests are only compiled and never executed) > > > > > > Then at *runtime*, Open MPI detects the *CPU* capabilities. > > > In your case, it should not invoke the functions containing AVX512 code. > > > > > > That being said, several changes were made to the op/avx component, > > > so if you are experiencing some crashes, I do invite you to give a try to > > > the > > > latest nightly snapshot for the v4.1.x branch. > > > > > > > > > Cheers, > > > > > > Gilles > > > > > > On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users > > > <users@lists.open-mpi.org> wrote: > > >> > > >> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on > > >> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor > > >> that supports AVX2 but not AVX512, resulted in > > >> > > >> checking for AVX512 support (no additional flags)... no > > >> checking for AVX512 support (with -march=skylake-avx512)... yes > > >> > > >> in "configure" output, and in config.log > > >> > > >> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' > > >> MCA_BUILD_ompi_op_has_avx512_support_TRUE='' > > >> > > >> Consequently AVX512 intrinsic functions were erroneously > > >> deployed, resulting in OpenMPI failure. > > >> > > >> The relevant test code was in essence > > >> > > >> cat > conftest.c << EOF > > >> #include <immintrin.h> > > >> > > >> int main() > > >> { > > >> __m512 vA, vB; > > >> > > >> _mm512_add_ps(vA, vB); > > >> > > >> return 0; > > >> } > > >> EOF > > >> > > >> The problem with this is that the result of the function > > >> is never used, so at optimization level higher than O0 > > >> the compiler elimates the function as "dead code" (DCE). > > >> To wit, > > >> > > >> gcc -O3 -march=skylake-avx512 -S conftest.c > > >> > > >> yields > > >> > > >> .file "conftest.c" > > >> .text > > >> .section .text.startup,"ax",@progbits > > >> .p2align 4 > > >> .globl main > > >> .type main, @function > > >> main: > > >> .LFB5345: > > >> .cfi_startproc > > >> xorl %eax, %eax > > >> ret > > >> .cfi_endproc > > >> .LFE5345: > > >> .size main, .-main > > >> .ident "GCC: (GNU) 10.2.0" > > >> .section .note.GNU-stack,"",@progbits > > >> > > >> Compare this with the result of > > >> > > >> gcc -O0 -march=skylake-avx512 -S conftest.c > > >> > > >> in which the function IS called: > > >> > > >> .file "conftest.c" > > >> .text > > >> .globl main > > >> .type main, @function > > >> main: > > >> .LFB4092: > > >> .cfi_startproc > > >> pushq %rbp > > >> .cfi_def_cfa_offset 16 > > >> .cfi_offset 6, -16 > > >> movq %rsp, %rbp > > >> .cfi_def_cfa_register 6 > > >> andq $-64, %rsp > > >> subq $136, %rsp > > >> vmovaps 72(%rsp), %zmm0 > > >> vmovaps %zmm0, -56(%rsp) > > >> vmovaps 8(%rsp), %zmm0 > > >> vmovaps %zmm0, -120(%rsp) > > >> movl $0, %eax > > >> leave > > >> .cfi_def_cfa 7, 8 > > >> ret > > >> .cfi_endproc > > >> .LFE4092: > > >> .size main, .-main > > >> .ident "GCC: (GNU) 10.2.0" > > >> .section .note.GNU-stack,"",@progbits > > >> > > >> Note the use of a 512-bit ZMM register - ZMM registers > > >> are used only by AVX512 instructions. Hence at O3 the > > >> test program does not detect the lack of AVX512 support > > >> by the host processor. > > >> > > >> An easy remedy would be to declare the operands as > > >> "volatile" and thereby force to compiler to invoke the > > >> function: > > >> > > >> cat > conftest.c << EOF > > >> #include <immintrin.h> > > >> > > >> int main() > > >> { > > >> volatile __m512 vA, vB; > > >> > > >> _mm512_add_ps(vA, vB); > > >> > > >> return 0; > > >> } > > >> > > >> Compiled at O3, the resulting executable dumps core as it > > >> should when run on my Haswell processor, returning nonzero > > >> exit status ($?), which would inform "configure" that the > > >> processor does not have AVX512 capability. > > >> > > >> Finally please note that this error could affect the > > >> detection of support for other instruction sets on other > > >> families of processors: compiler optimization must be > > >> inhibited for such tests to be reliable! > > >> > > >> Max > > >> --- > > >> Max R. Dechantsreiter > > >> President > > >> Performance Jones L.L.C. > > >> m...@performancejones.com > > >> Skype: PerformanceJones (UTC+01:00) > > >> +1 414 446-3100 (telephone/voicemail) > > >> http://www.linkedin.com/in/benchmarking > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > >