I think Max did try the latest 4.1 nightly build (from an off-list email), and his problem still persisted.
Max: can you describe exactly how Open MPI failed? All you said was: >> Consequently AVX512 intrinsic functions were erroneously >> deployed, resulting in OpenMPI failure. Can you provide more details? > On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users > <users@lists.open-mpi.org> wrote: > > Max, > > at configure time, Open MPI detects the *compiler* capabilities. > In your case, your compiler can emit AVX512 code. > (and fwiw, the tests are only compiled and never executed) > > Then at *runtime*, Open MPI detects the *CPU* capabilities. > In your case, it should not invoke the functions containing AVX512 code. > > That being said, several changes were made to the op/avx component, > so if you are experiencing some crashes, I do invite you to give a try to the > latest nightly snapshot for the v4.1.x branch. > > > Cheers, > > Gilles > > On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users > <users@lists.open-mpi.org> wrote: >> >> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on >> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor >> that supports AVX2 but not AVX512, resulted in >> >> checking for AVX512 support (no additional flags)... no >> checking for AVX512 support (with -march=skylake-avx512)... yes >> >> in "configure" output, and in config.log >> >> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' >> MCA_BUILD_ompi_op_has_avx512_support_TRUE='' >> >> Consequently AVX512 intrinsic functions were erroneously >> deployed, resulting in OpenMPI failure. >> >> The relevant test code was in essence >> >> cat > conftest.c << EOF >> #include <immintrin.h> >> >> int main() >> { >> __m512 vA, vB; >> >> _mm512_add_ps(vA, vB); >> >> return 0; >> } >> EOF >> >> The problem with this is that the result of the function >> is never used, so at optimization level higher than O0 >> the compiler elimates the function as "dead code" (DCE). >> To wit, >> >> gcc -O3 -march=skylake-avx512 -S conftest.c >> >> yields >> >> .file "conftest.c" >> .text >> .section .text.startup,"ax",@progbits >> .p2align 4 >> .globl main >> .type main, @function >> main: >> .LFB5345: >> .cfi_startproc >> xorl %eax, %eax >> ret >> .cfi_endproc >> .LFE5345: >> .size main, .-main >> .ident "GCC: (GNU) 10.2.0" >> .section .note.GNU-stack,"",@progbits >> >> Compare this with the result of >> >> gcc -O0 -march=skylake-avx512 -S conftest.c >> >> in which the function IS called: >> >> .file "conftest.c" >> .text >> .globl main >> .type main, @function >> main: >> .LFB4092: >> .cfi_startproc >> pushq %rbp >> .cfi_def_cfa_offset 16 >> .cfi_offset 6, -16 >> movq %rsp, %rbp >> .cfi_def_cfa_register 6 >> andq $-64, %rsp >> subq $136, %rsp >> vmovaps 72(%rsp), %zmm0 >> vmovaps %zmm0, -56(%rsp) >> vmovaps 8(%rsp), %zmm0 >> vmovaps %zmm0, -120(%rsp) >> movl $0, %eax >> leave >> .cfi_def_cfa 7, 8 >> ret >> .cfi_endproc >> .LFE4092: >> .size main, .-main >> .ident "GCC: (GNU) 10.2.0" >> .section .note.GNU-stack,"",@progbits >> >> Note the use of a 512-bit ZMM register - ZMM registers >> are used only by AVX512 instructions. Hence at O3 the >> test program does not detect the lack of AVX512 support >> by the host processor. >> >> An easy remedy would be to declare the operands as >> "volatile" and thereby force to compiler to invoke the >> function: >> >> cat > conftest.c << EOF >> #include <immintrin.h> >> >> int main() >> { >> volatile __m512 vA, vB; >> >> _mm512_add_ps(vA, vB); >> >> return 0; >> } >> >> Compiled at O3, the resulting executable dumps core as it >> should when run on my Haswell processor, returning nonzero >> exit status ($?), which would inform "configure" that the >> processor does not have AVX512 capability. >> >> Finally please note that this error could affect the >> detection of support for other instruction sets on other >> families of processors: compiler optimization must be >> inhibited for such tests to be reliable! >> >> Max >> --- >> Max R. Dechantsreiter >> President >> Performance Jones L.L.C. >> m...@performancejones.com >> Skype: PerformanceJones (UTC+01:00) >> +1 414 446-3100 (telephone/voicemail) >> http://www.linkedin.com/in/benchmarking -- Jeff Squyres jsquy...@cisco.com