I think Max did try the latest 4.1 nightly build (from an off-list email), and 
his problem still persisted.

Max: can you describe exactly how Open MPI failed?  All you said was:

>> Consequently AVX512 intrinsic functions were erroneously
>> deployed, resulting in OpenMPI failure.

Can you provide more details?


> On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users 
> <users@lists.open-mpi.org> wrote:
> 
> Max,
> 
> at configure time, Open MPI detects the *compiler* capabilities.
> In your case, your compiler can emit AVX512 code.
> (and fwiw, the tests are only compiled and never executed)
> 
> Then at *runtime*, Open MPI detects the *CPU* capabilities.
> In your case, it should not invoke the functions containing AVX512 code.
> 
> That being said, several changes were made to the op/avx component,
> so if you are experiencing some crashes, I do invite you to give a try to the
> latest nightly snapshot for the v4.1.x branch.
> 
> 
> Cheers,
> 
> Gilles
> 
> On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
> <users@lists.open-mpi.org> wrote:
>> 
>> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
>> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
>> that supports AVX2 but not AVX512, resulted in
>> 
>> checking for AVX512 support (no additional flags)... no
>> checking for AVX512 support (with -march=skylake-avx512)... yes
>> 
>> in "configure" output, and in config.log
>> 
>> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
>> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
>> 
>> Consequently AVX512 intrinsic functions were erroneously
>> deployed, resulting in OpenMPI failure.
>> 
>> The relevant test code was in essence
>> 
>> cat > conftest.c << EOF
>> #include <immintrin.h>
>> 
>> int main()
>> {
>>        __m512 vA, vB;
>> 
>>        _mm512_add_ps(vA, vB);
>> 
>>        return 0;
>> }
>> EOF
>> 
>> The problem with this is that the result of the function
>> is never used, so at optimization level higher than O0
>> the compiler elimates the function as "dead code" (DCE).
>> To wit,
>> 
>> gcc -O3 -march=skylake-avx512 -S conftest.c
>> 
>> yields
>> 
>>        .file   "conftest.c"
>>        .text
>>        .section        .text.startup,"ax",@progbits
>>        .p2align 4
>>        .globl  main
>>        .type   main, @function
>> main:
>> .LFB5345:
>>        .cfi_startproc
>>        xorl    %eax, %eax
>>        ret
>>        .cfi_endproc
>> .LFE5345:
>>        .size   main, .-main
>>        .ident  "GCC: (GNU) 10.2.0"
>>        .section        .note.GNU-stack,"",@progbits
>> 
>> Compare this with the result of
>> 
>> gcc -O0 -march=skylake-avx512 -S conftest.c
>> 
>> in which the function IS called:
>> 
>>        .file   "conftest.c"
>>        .text
>>        .globl  main
>>        .type   main, @function
>> main:
>> .LFB4092:
>>        .cfi_startproc
>>        pushq   %rbp
>>        .cfi_def_cfa_offset 16
>>        .cfi_offset 6, -16
>>        movq    %rsp, %rbp
>>        .cfi_def_cfa_register 6
>>        andq    $-64, %rsp
>>        subq    $136, %rsp
>>        vmovaps 72(%rsp), %zmm0
>>        vmovaps %zmm0, -56(%rsp)
>>        vmovaps 8(%rsp), %zmm0
>>        vmovaps %zmm0, -120(%rsp)
>>        movl    $0, %eax
>>        leave
>>        .cfi_def_cfa 7, 8
>>        ret
>>        .cfi_endproc
>> .LFE4092:
>>        .size   main, .-main
>>        .ident  "GCC: (GNU) 10.2.0"
>>        .section        .note.GNU-stack,"",@progbits
>> 
>> Note the use of a 512-bit ZMM register - ZMM registers
>> are used only by AVX512 instructions.  Hence at O3 the
>> test program does not detect the lack of AVX512 support
>> by the host processor.
>> 
>> An easy remedy would be to declare the operands as
>> "volatile" and thereby force to compiler to invoke the
>> function:
>> 
>> cat > conftest.c << EOF
>> #include <immintrin.h>
>> 
>> int main()
>> {
>>        volatile __m512 vA, vB;
>> 
>>        _mm512_add_ps(vA, vB);
>> 
>>        return 0;
>> }
>> 
>> Compiled at O3, the resulting executable dumps core as it
>> should when run on my Haswell processor, returning nonzero
>> exit status ($?), which would inform "configure" that the
>> processor does not have AVX512 capability.
>> 
>> Finally please note that this error could affect the
>> detection of support for other instruction sets on other
>> families of processors: compiler optimization must be
>> inhibited for such tests to be reliable!
>> 
>> Max
>> ---
>> Max R. Dechantsreiter
>> President
>> Performance Jones L.L.C.
>> m...@performancejones.com
>> Skype: PerformanceJones (UTC+01:00)
>> +1 414 446-3100 (telephone/voicemail)
>> http://www.linkedin.com/in/benchmarking


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to