I ran into a problem with 4.1.0 several weeks ago, and no longer recall precisely how; I am now rebuilding both 4.1.0 and a recent 4.1.x, then will use them to build GROMACS, probably the application I was attemping back then.
But I do have this from my notes (for 4.1.0): mpicc -fopenmp hybrid_hello.c export OMP_NUM_THREADS=2 mpirun -np 2 ./a.out # [server.clearlight.us:18349] mca_base_component_repository_open: unable to open mca_op_avx: /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: undefined symbol: ompi_op_avx_functions_avx512 (ignored) # [server.clearlight.us:18348] mca_base_component_repository_open: unable to open mca_op_avx: /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so: undefined symbol: ompi_op_avx_functions_avx512 (ignored) # Hello from thread 0 out of 2 from process 0 out of 2 on server.clearlight.us # Hello from thread 1 out of 2 from process 0 out of 2 on server.clearlight.us # Hello from thread 0 out of 2 from process 1 out of 2 on server.clearlight.us # Hello from thread 1 out of 2 from process 1 out of 2 on server.clearlight.us (where I X-ed out confidential details). Not an error, but surely indicative of something amiss. More to come! On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via users wrote: > I think Max did try the latest 4.1 nightly build (from an off-list email), > and his problem still persisted. > > Max: can you describe exactly how Open MPI failed? All you said was: > > >> Consequently AVX512 intrinsic functions were erroneously > >> deployed, resulting in OpenMPI failure. > > Can you provide more details? > > > > On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users > > <users@lists.open-mpi.org> wrote: > > > > Max, > > > > at configure time, Open MPI detects the *compiler* capabilities. > > In your case, your compiler can emit AVX512 code. > > (and fwiw, the tests are only compiled and never executed) > > > > Then at *runtime*, Open MPI detects the *CPU* capabilities. > > In your case, it should not invoke the functions containing AVX512 code. > > > > That being said, several changes were made to the op/avx component, > > so if you are experiencing some crashes, I do invite you to give a try to > > the > > latest nightly snapshot for the v4.1.x branch. > > > > > > Cheers, > > > > Gilles > > > > On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users > > <users@lists.open-mpi.org> wrote: > >> > >> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on > >> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor > >> that supports AVX2 but not AVX512, resulted in > >> > >> checking for AVX512 support (no additional flags)... no > >> checking for AVX512 support (with -march=skylake-avx512)... yes > >> > >> in "configure" output, and in config.log > >> > >> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' > >> MCA_BUILD_ompi_op_has_avx512_support_TRUE='' > >> > >> Consequently AVX512 intrinsic functions were erroneously > >> deployed, resulting in OpenMPI failure. > >> > >> The relevant test code was in essence > >> > >> cat > conftest.c << EOF > >> #include <immintrin.h> > >> > >> int main() > >> { > >> __m512 vA, vB; > >> > >> _mm512_add_ps(vA, vB); > >> > >> return 0; > >> } > >> EOF > >> > >> The problem with this is that the result of the function > >> is never used, so at optimization level higher than O0 > >> the compiler elimates the function as "dead code" (DCE). > >> To wit, > >> > >> gcc -O3 -march=skylake-avx512 -S conftest.c > >> > >> yields > >> > >> .file "conftest.c" > >> .text > >> .section .text.startup,"ax",@progbits > >> .p2align 4 > >> .globl main > >> .type main, @function > >> main: > >> .LFB5345: > >> .cfi_startproc > >> xorl %eax, %eax > >> ret > >> .cfi_endproc > >> .LFE5345: > >> .size main, .-main > >> .ident "GCC: (GNU) 10.2.0" > >> .section .note.GNU-stack,"",@progbits > >> > >> Compare this with the result of > >> > >> gcc -O0 -march=skylake-avx512 -S conftest.c > >> > >> in which the function IS called: > >> > >> .file "conftest.c" > >> .text > >> .globl main > >> .type main, @function > >> main: > >> .LFB4092: > >> .cfi_startproc > >> pushq %rbp > >> .cfi_def_cfa_offset 16 > >> .cfi_offset 6, -16 > >> movq %rsp, %rbp > >> .cfi_def_cfa_register 6 > >> andq $-64, %rsp > >> subq $136, %rsp > >> vmovaps 72(%rsp), %zmm0 > >> vmovaps %zmm0, -56(%rsp) > >> vmovaps 8(%rsp), %zmm0 > >> vmovaps %zmm0, -120(%rsp) > >> movl $0, %eax > >> leave > >> .cfi_def_cfa 7, 8 > >> ret > >> .cfi_endproc > >> .LFE4092: > >> .size main, .-main > >> .ident "GCC: (GNU) 10.2.0" > >> .section .note.GNU-stack,"",@progbits > >> > >> Note the use of a 512-bit ZMM register - ZMM registers > >> are used only by AVX512 instructions. Hence at O3 the > >> test program does not detect the lack of AVX512 support > >> by the host processor. > >> > >> An easy remedy would be to declare the operands as > >> "volatile" and thereby force to compiler to invoke the > >> function: > >> > >> cat > conftest.c << EOF > >> #include <immintrin.h> > >> > >> int main() > >> { > >> volatile __m512 vA, vB; > >> > >> _mm512_add_ps(vA, vB); > >> > >> return 0; > >> } > >> > >> Compiled at O3, the resulting executable dumps core as it > >> should when run on my Haswell processor, returning nonzero > >> exit status ($?), which would inform "configure" that the > >> processor does not have AVX512 capability. > >> > >> Finally please note that this error could affect the > >> detection of support for other instruction sets on other > >> families of processors: compiler optimization must be > >> inhibited for such tests to be reliable! > >> > >> Max > >> --- > >> Max R. Dechantsreiter > >> President > >> Performance Jones L.L.C. > >> m...@performancejones.com > >> Skype: PerformanceJones (UTC+01:00) > >> +1 414 446-3100 (telephone/voicemail) > >> http://www.linkedin.com/in/benchmarking > > > -- > Jeff Squyres > jsquy...@cisco.com >