[Bug target/72784] AVX512: Assembler failure when compiling on OSX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72784 Wenzel Jakob changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from Wenzel Jakob --- Closing this due to lack of attention/relevance.
[Bug target/87674] New: AVX512: incorrect intrinsic signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87674 Bug ID: 87674 Summary: AVX512: incorrect intrinsic signature Product: gcc Version: 8.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- Hi, I'm seeing a number of warnings related to the following three intrinsics, which appaer to have an incorrect signature. The fix is easy: simply change __mmask16 to __mmask8 for those definitions (and this is also what's correct according to Intel's Intrinsics Explorer) /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlintrin.h: In function ‘__m128i _mm_mask_mullo_epi32(__m128i, __mmask16, __m128i, __m128i)’: /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlintrin.h:9055:23: warning: conversion from ‘__mmask16’ {aka ‘short unsigned int’} to ‘unsigned char’ may change value [-Wconversion] (__v4si) __W, __M); ^~~ /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlbwintrin.h: In function ‘__m128i _mm_mask_packus_epi32(__m128i, __mmask16, __m128i, __m128i)’: /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlbwintrin.h:4354:25: warning: conversion from ‘__mmask16’ {aka ‘short unsigned int’} to ‘unsigned char’ may change value [-Wconversion] (__v8hi) __W, __M); ^~~ /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlbwintrin.h: In function ‘__m128i _mm_mask_packs_epi32(__m128i, __mmask16, __m128i, __m128i)’: /home/wjakob/dist/lib/gcc/x86_64-pc-linux-gnu/8.2.0/include/avx512vlbwintrin.h:4397:25: warning: conversion from ‘__mmask16’ {aka ‘short unsigned int’} to ‘unsigned char’ may change value [-Wconversion] (__v8hi) __W, __M); ^~~ Best, Wenzel
[Bug target/87674] AVX512: incorrect intrinsic signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87674 --- Comment #3 from Wenzel Jakob --- Thanks -- this patch works for me. With regards to the signature difference: I had already stumbled about the (float *) vs (some value *) difference in some intrinsics. In the best case differences cause warnings (ok, but still annoying :)), in the worst case special casts are needed for GCC, making intrinsics code less portable between compilers. So my vote would definitely matching ICC behavior 1:1.
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #6 from Wenzel Jakob --- Are there any news here? This is clearly an issue, and it would be nice to fix it. I currently can't compile my AVX512 project on GCC due to this bug.
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #9 from Wenzel Jakob --- Hi -- just a ping regarding this issue. Thanks, Wenzel
[Bug target/73350] AVX512: GCC optimizes away rounding flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 --- Comment #4 from Wenzel Jakob --- This bug is still present in the latest GCC -- are there any plans to fix it?
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #11 from Wenzel Jakob --- Searching through the intrinsics guide (e.g. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=gather_ps), I see "void *" for all gather scatter intrinsics and "const void *" for all gather intrinsics consistently applied. This is in contrast to the load intrinsics, where there are some inconsistencies between argument conventions (e.g. _mm512_load_ps vs _mm256_load_ps)
[Bug target/79481] New: AVX512PF: unmasked gather prefetch intrinsics missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79481 Bug ID: 79481 Summary: AVX512PF: unmasked gather prefetch intrinsics missing Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The latest trunk version (and all versions before as far as I can tell) are missing the following (unmasked) intrinsics for gather prefetches: _mm512_prefetch_i32gather_pd _mm512_prefetch_i64gather_pd _mm512_prefetch_i32gather_ps _mm512_prefetch_i64gather_ps It would be great if these could be added. Thanks, Wenzel
[Bug target/79481] AVX512PF: unmasked gather prefetch intrinsics missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79481 --- Comment #2 from Wenzel Jakob --- I agree that the docs from Intel are not particularly consistent. In this case, the hardware has dedicated instructions for these type of gathers, so it would make sense for a matching intrinsic to be part of GCC.
[Bug target/79481] AVX512PF: unmasked gather prefetch intrinsics missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79481 --- Comment #4 from Wenzel Jakob --- I think that's right. Clang e.g. also does this: #define _mm512_prefetch_i32gather_ps(index, addr, scale, hint) ({\ __builtin_ia32_gatherpfdps((__mmask16) -1, \ (__v16si)(__m512i)(index), (int const *)(addr), \ (int)(scale), (int)(hint)); })
[Bug c++/77629] New: internal compiler error: same canonical type node for different types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77629 Bug ID: 77629 Summary: internal compiler error: same canonical type node for different types Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- Created attachment 39640 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39640&action=edit Preprocessed source code I am running into the following internal compiler error with GCC TRUNK. The preprocessed file is attached. $ /usr/local/bin/gcc7 out.cpp In file included from include/simdarray/array.h:58:0, from tests/testsuite.cpp:21: include/simdarray/array_recursive.h:124:93: internal compiler error: same canonical type node for different types simd::ArrayBase::peel != -1), void>::type>::Base and simd::ArrayOperations SIMD_INLINE ArrayBase(Scalar value) : a1(value), a2(value) { } ^ 0x781d74 comptypes(tree_node*, tree_node*, int) ../../gcc/cp/typeck.c:1437 0x6bcd01 resolve_typename_type(tree_node*, bool) ../../gcc/cp/pt.c:23721 0x7802ec structural_comptypes ../../gcc/cp/typeck.c:1204 0x7848ad comptypes(tree_node*, tree_node*, int) ../../gcc/cp/typeck.c:1409 0x7848ad compparms(tree_node const*, tree_node const*) ../../gcc/cp/typeck.c:1539 0x703b5c add_method(tree_node*, tree_node*, tree_node*) ../../gcc/cp/class.c:1155 0x7d0c64 finish_member_declaration(tree_node*) ../../gcc/cp/semantics.c:2997 0x76d3d8 cp_parser_member_declaration ../../gcc/cp/parser.c:22770 0x747b7a cp_parser_member_specification_opt ../../gcc/cp/parser.c:22331 0x747b7a cp_parser_class_specifier_1 ../../gcc/cp/parser.c:21496 0x74a069 cp_parser_class_specifier ../../gcc/cp/parser.c:21745 0x74a069 cp_parser_type_specifier ../../gcc/cp/parser.c:15971 0x75da97 cp_parser_decl_specifier_seq ../../gcc/cp/parser.c:12889 0x76b965 cp_parser_single_declaration ../../gcc/cp/parser.c:25975 0x76bd0c cp_parser_template_declaration_after_parameters ../../gcc/cp/parser.c:25667 0x76c68c cp_parser_explicit_template_declaration ../../gcc/cp/parser.c:25902 0x76c68c cp_parser_template_declaration_after_export ../../gcc/cp/parser.c:25920 0x7735a9 cp_parser_declaration ../../gcc/cp/parser.c:12209 0x771d7b cp_parser_declaration_seq_opt ../../gcc/cp/parser.c:12139 0x7724b2 cp_parser_namespace_body ../../gcc/cp/parser.c:17763 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions.
[Bug c++/69481] ICE with C++11 alias using with templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 --- Comment #4 from Wenzel Jakob --- I'm pretty sure this is a recent regression -- GCC was able to compile the code on Bug 77629 a month ago.
[Bug c++/69481] ICE with C++11 alias using with templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 --- Comment #6 from Wenzel Jakob --- No -- I am experimenting with the AVX512F backend and thus need to use the development branch.
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #3 from Wenzel Jakob --- Any updates here? Should this be closed?
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #4 from Wenzel Jakob --- Hmm, it looks like this is still an issue. Recompiling my codebase with the latest trunk version of gcc still produces many errors caused by this, e.g. include/simdarray/array_avx512.h:1059:53: error: invalid conversion from ‘simd::ArrayOperations >::Scalar* {aka unsigned int*}’ to ‘const int*’ [-fpermissive] __m512i values = _mm512_mask_i32gather_epi32( ~~~^ _mm512_undefined_epi32(), mask.k, index.m, f, sizeof(Scalar)); ~ In file included from /usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/immintrin.h:45:0, from include/simdarray/array.h:33, from tests/histogram.cpp:2: /usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/avx512fintrin.h:9316:1: note: initializing argument 4 of ‘__m512i _mm512_mask_i32gather_epi32(__m512i, __mmask16, __m512i, const int*, int)’ _mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask, ^~~ In file included from include/simdarray/array.h:73:0, from tests/histogram.cpp:2: include/simdarray/array_avx512.h:1068:22: error: use of ‘main(int, char**):: [with auto:1 = simd::Array]’ before deduction of ‘auto’ values = func(Derived(values)).m; etc...
[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824 --- Comment #13 from Wenzel Jakob --- The fix was merged, so I assume this bug should be closed as RESOLVED?
[Bug c++/69481] ICE with C++11 alias using with templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 --- Comment #7 from Wenzel Jakob --- Correction: this ICE indeed goes away when building with --enable-checking=release (though that doesn't seem like a nice solution). I assume I used this check level in my trunk builds before and forgot it this time.
[Bug target/77633] New: AVX512: shuffle intrinsic has incorrect signature when optimizations are enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77633 Bug ID: 77633 Summary: AVX512: shuffle intrinsic has incorrect signature when optimizations are enabled Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The AVX512 shuffle intrinsic switches to a different implementation (& different signature) when optimizations are turned on. This leads to the following strange error message when compiling a snippet that passes the type checker at -O0. /// $ g++-7 test.c -march=knl -O3 In file included from /usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/immintrin.h:29:0, from test.c:1: test.c: In function ‘void test()’: test.c:8:50: error: invalid conversion from ‘int’ to ‘_MM_PERM_ENUM’ [-fpermissive] _mm512_shuffle_epi32(_mm512_setzero_epi32(), _MM_SHUFFLE(0, 3, 0, 1)); ^ In file included from /usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/immintrin.h:45:0, from test.c:1: /usr/local/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/avx512fintrin.h:3848:1: note: initializing argument 2 of ‘__m512i _mm512_shuffle_epi32(__m512i, _MM_PERM_ENUM)’ _mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask) ^~~~ /// #include void test() { /* SSE shuffle: works */ _mm_shuffle_epi32(_mm_setzero_si128(), _MM_SHUFFLE(0, 3, 0, 1)); /* AVX512 shuffle: type checker error when optimizations are turned on! */ _mm512_shuffle_epi32(_mm512_setzero_epi32(), _MM_SHUFFLE(0, 3, 0, 1)); }
[Bug target/77633] AVX512: shuffle intrinsic has incorrect signature when optimizations are enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77633 --- Comment #2 from Wenzel Jakob --- I just tried compiling this snippet with ICC 17.0.0. It accepts it without warnings (-Wall -Wconversion -Wextra). So even if the signature is different, ICC seems to be more relaxed about passing an integer value to an enum parameter.
[Bug target/77633] AVX512: shuffle intrinsic has incorrect signature when optimizations are enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77633 --- Comment #4 from Wenzel Jakob --- Aha, interesting -- that breaks it: test.cpp(9): error: argument of type "int" is incompatible with parameter of type "_MM_PERM_ENUM={_MM_PERM_ENUM}" _mm512_shuffle_epi32(_mm512_setzero_epi32(), _MM_SHUFFLE(0, 3, 0, 1)); Definitely not a very nice API design! I assume the right course of action then will be to mark this issue INVALID and change my code to cast to _MM_PERM_ENUM?
[Bug target/72773] New: AVX512: Invalid operand for vcvttss2siq instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72773 Bug ID: 72773 Summary: AVX512: Invalid operand for vcvttss2siq instruction Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- Created attachment 39047 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39047&action=edit Preprocessed file causing the issue Hi, I'm running into a pesky AVX512F code generation issue using the latest HEAD version of gcc on my OSX development machine. Compiling the attached preprocessed file yields the following error messages: g++-7 test.i -xc++ -std=c++14 -O3 -DNDEBUG -fomit-frame-pointer -mavx2 -mfma -mf16c -mavx512f -Wa,-mavx512f /var/folders/94/rfzxfhbn3hjb4p402lg_yjgwgn/T//cci4IcB4.s:555:14: error: invalid operand for instruction vcvttss2siq %xmm18, %rax ^~ /var/folders/94/rfzxfhbn3hjb4p402lg_yjgwgn/T//cci4IcB4.s:559:14: error: invalid operand for instruction vcvttss2siq %xmm17, %rax ^~ /var/folders/94/rfzxfhbn3hjb4p402lg_yjgwgn/T//cci4IcB4.s:561:14: error: invalid operand for instruction vcvttss2siq %xmm16, %rax ^~ AFAIK on OSX, GCC uses the Clang assembler. There are thus two possibilities: 1. The vcvttss2siq instrunction does not exist for new-style xmm register arguments, and GCC should not have generated it 2. It is a valid instruction, and it's the Clang assembler's fault for not recognizing it. I am not familiar enough with the AVX512F assembly and will create a ticket in both the GCC and LLVM bugtracker so that this problem can be addressed. Details on my compiler version: COLLECT_GCC=g++-7 COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/HEAD-/libexec/gcc/x86_64-apple-darwin15.5.0/7.0.0/lto-wrapper Target: x86_64-apple-darwin15.5.0 Configured with: ../configure --build=x86_64-apple-darwin15.5.0 --prefix=/usr/local/Cellar/gcc/HEAD- --libdir=/usr/local/Cellar/gcc/HEAD-/lib/gcc/ --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=- --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-system-zlib --enable-libstdcxx-time=yes --enable-stage1-checking --enable-checking=release --enable-lto --with-build-config=bootstrap-debug --disable-werror --with-pkgversion='Homebrew gcc HEAD- --without-multilib' --with-bugurl=https://github.com/Homebrew/homebrew/issues --enable-plugin --disable-nls --disable-multilib Thread model: posix gcc version 7.0.0 20160801 (experimental) (Homebrew gcc HEAD- --without-multilib)
[Bug target/72773] AVX512: Invalid operand for vcvttss2siq instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72773 --- Comment #1 from Wenzel Jakob --- The LLVM ticket is here: https://llvm.org/bugs/show_bug.cgi?id=28810
[Bug target/72773] AVX512: Invalid operand for vcvttss2siq instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72773 --- Comment #3 from Wenzel Jakob --- It looks like it is an LLVM issue (see https://llvm.org/bugs/show_bug.cgi?id=28810)
[Bug target/72782] New: AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 Bug ID: 72782 Summary: AVX512: No support for scalar broadcasts Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- AVX512 introduces the ability to do scalar broadcasts, which significantly cuts down on the number of explicit broadcast instructions in vectorized code. It looks like the AVX512 code generation backend on GCC does not recognize/make use of this instruction set feature: Consider the following snippet: __m512 addConstant(__m512 arg) { return _mm512_add_ps(arg, _mm512_set1_ps(1.f)); } This is the assembly generated by GCC (HEAD): __Z11addConstantDv16_f: LFB4589: vbroadcastssLC0(%rip), %zmm1 vaddps %zmm1, %zmm0, %zmm0 ret For reference, this is the output generated by Clang: _Z11addConstantDv16_f: ## @_Z11addConstantDv16_f vaddps LCPI0_0(%rip){1to16}, %zmm0, %zmm0 retq
[Bug target/72784] New: AVX512: Assembler failure when compiling on OSX
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72784 Bug ID: 72784 Summary: AVX512: Assembler failure when compiling on OSX Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- GCC (HEAD) fails to compile basic AVX512 code on my machine (OSX 10.11.6) which I'm using to develop for (and emulate) this architecture. Consider the following small program: #include __m512 addConstant(__m512 arg) { return _mm512_add_ps(arg, _mm512_set1_ps(1.f)); } Compiling yields the following error message at the assembler stage: $ g++-7 test.cpp -c -o test.s -O3 -mavx512f /var/folders/lm/4mxv3gx901q6sympjjnzbrb4gp/T//cc0nXIBi.s:6:2: error: instruction requires: AVX-512 ISA vbroadcastssLC0(%rip), %zmm1 ^ /var/folders/lm/4mxv3gx901q6sympjjnzbrb4gp/T//cc0nXIBi.s:7:2: error: instruction requires: AVX-512 ISA vaddps %zmm1, %zmm0, %zmm0 This is an interesting interaction between the GCC toolchain and the Clang assembler which only occurs when developing on OSX. It is possible to work around the error by specifying an additional option "-Wa,-mavx512f" to the compiler. However, this is certainly non-ideal, since it is nonstandard parameter and in fact causes builds on other platforms to fail. Ideally GCC (on OSX only) would transparently forward the -march=<...> and -mavx* parameters to the LLVM assembler.
[Bug target/72805] New: AVX512: invalid code generation involving masks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72805 Bug ID: 72805 Summary: AVX512: invalid code generation involving masks Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- Consider the following minimal program, which initializes an 16 int AVX512 vector with -1 entries, does a componen-twise "< 0" comparison, and prints the resulting mask. Since there are 16 entries, the expected output is "65535". GCC trunk prints "255" (compilation flags: g++-7 -S -mavx512f test.c -o test.s -fomit-frame-pointer -fno-asynchronous-unwind-tables -fno-exceptions). The issue goes away when compiling at higher optimization levels, though that is clearly not a good solution. #include #include __attribute__((noinline)) int test() { __m512i value = _mm512_set1_epi32(-1); return (int) _mm512_cmp_epi32_mask(value, _mm512_setzero_si512(), 1 /* _MM_CMPINT_LT */); } int main(int argc, char *argv[]) { printf("%i\n", test()); return 0; } Looking at the assembly reveals the problem: __Z4testv: leaq8(%rsp), %r10 andq$-64, %rsp pushq -8(%r10) pushq %rbp movq%rsp, %rbp pushq %r10 subq$112, %rsp movl$-1, -52(%rbp) vmovdqa64 -176(%rbp), %zmm0 movl$-1, %eax kmovw %eax, %k2 vpbroadcastd-52(%rbp), %zmm0{%k2} vmovdqa64 %zmm0, -240(%rbp) vpxord %zmm0, %zmm0, %zmm0 vmovdqa64 %zmm0, %zmm1 vmovdqa64 -240(%rbp), %zmm0 movl$-1, %eax kmovw %eax, %k3 vpcmpd $1, %zmm1, %zmm0, %k1{%k3} kmovw %k1, %eax movzbl %al, %eax<- UH OH addq$112, %rsp popq%r10 popq%rbp leaq-8(%r10), %rsp ret For some reason, GCC things that the mask is only eight byte wide and uses a "movzbl" instruction. At higher optimization levels, many of the moves are elided, and the mask is directly copied to %eax. Very mysterious. __Z4testv: vpternlogd $0xFF, %zmm0, %zmm0, %zmm0 vpxord %zmm1, %zmm1, %zmm1 vpcmpd $1, %zmm1, %zmm0, %k1 kmovw %k1, %eax movzwl %ax, %eax ret
[Bug target/72805] AVX512: invalid code generation involving masks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72805 --- Comment #6 from Wenzel Jakob --- awesome, thanks!
[Bug tree-optimization/72824] New: [7 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824 Bug ID: 72824 Summary: [7 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns) Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The trunk version of GCC has a regression which optimizes away signed zeros at optimization level -O3. This should never happen unless more aggressive optimization flags are specified (like -ffast-math or -fno-signed-zeros, neither of which are part of -O3). Having correct signed zero semantics is important for many scientific computing applications. $ g++-7 test.cpp -o test -O2 $ ./test -0.00 # < Correct $ g++-7 test.cpp -o test -O3 $ ./test 0.00 # < Signed zero gone It's possible to fix the issue by adding -fno-tree-loop-distribute-patterns, so I assume that it is somehow related to this optimization. Program to reproduce: #include template struct Array { Array(float value) { for (size_t i = 0; i array(-0.f); printf("%f\n", array.x[0]); return 0; } This is with the latest trunk version of GCC: $ g++-7 -v Using built-in specs. COLLECT_GCC=g++-7 COLLECT_LTO_WRAPPER=/usr/local/Cellar/gcc/HEAD-/libexec/gcc/x86_64-apple-darwin15.6.0/7.0.0/lto-wrapper Target: x86_64-apple-darwin15.6.0 Configured with: ../configure --build=x86_64-apple-darwin15.6.0 --prefix=/usr/local/Cellar/gcc/HEAD- --libdir=/usr/local/Cellar/gcc/HEAD-/lib/gcc/ --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=- --with-gmp=/usr/local/opt/gmp --with-mpfr=/usr/local/opt/mpfr --with-mpc=/usr/local/opt/libmpc --with-isl=/usr/local/opt/isl --with-system-zlib --enable-libstdcxx-time=yes --enable-stage1-checking --enable-checking=release --enable-lto --with-build-config=bootstrap-debug --disable-werror --with-pkgversion='Homebrew gcc HEAD- --without-multilib' --with-bugurl=https://github.com/Homebrew/homebrew/issues --enable-plugin --disable-nls --disable-multilib Thread model: posix gcc version 7.0.0 20160804 (experimental) (Homebrew gcc HEAD- --without-multilib)
[Bug target/72867] New: SSE/AVX/AVX512: incorrect optimization of VMINPS/VMAXPS at compile time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72867 Bug ID: 72867 Summary: SSE/AVX/AVX512: incorrect optimization of VMINPS/VMAXPS at compile time Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The Intel intrinsics provide a family functions for computing the minimum and maximum of two floating point vectors of different SIMD widths. For the most part, these are symmetric. They are not, however, when given a NaN argument: in particular, min(1, nan) == 1 min(nan, 1) == nan Whether that is pretty is arguable, but it's what the hardware implements (and numerical libraries depend on this behavior). The program below computes the expected output at optimization level 0. $ g++ test.c -o test -msse4.2 -O0 && ./test min(1, nan) = [nan nan nan nan] min(nan, 1) = [1.00 1.00 1.00 1.00] At optimization level 1, the minimum is computed at compile time, and the NaN value is incorrectly propagated. This problem occurs both on GCC trunk and on GCC 5.0 (I have not tested other versions). $ g++ test.c -o test -msse4.2 -O1 && ./test min(1, nan) = [nan nan nan nan] min(nan, 1) = [nan nan nan nan] /// Program to reproduce the issue #include #include #include int main(int argc, char *argv[]) { __m128 x = _mm_min_ps(_mm_set1_ps(1.f), _mm_set1_ps(NAN)); printf("min(1, nan) = [%f %f %f %f]\n", x[0], x[1], x[2], x[3]); x = _mm_min_ps(_mm_set1_ps(NAN), _mm_set1_ps(1.f)); printf("min(nan, 1) = [%f %f %f %f]\n", x[0], x[1], x[2], x[3]); return 0; }
[Bug target/72782] AVX512: No support for scalar broadcasts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72782 --- Comment #1 from Wenzel Jakob --- Looks like this issue was first reported in 2014 but got stuck -- see Bug 63351.
[Bug target/63351] Optimization: contract broadcast intrinsics when AVX512 is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63351 Wenzel Jakob changed: What|Removed |Added CC||wen...@mitsuba-renderer.org --- Comment #5 from Wenzel Jakob --- Any news on this? I've also run into GCC's lack of broadcast support (Bug 72782).
[Bug tree-optimization/72824] [5/6 Regression] Signed floating point zero semantics broken at optimization level -O3 (tree-loop-distribute-patterns)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72824 --- Comment #8 from Wenzel Jakob --- Thank you, I can confirm that the issue is fixed on my end.
[Bug target/73350] New: AVX512: GCC optimizes away rounding flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 Bug ID: 73350 Summary: AVX512: GCC optimizes away rounding flags Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The AVX512 instruction set introduced the ability to specify a rounding flag for almost every arithmetic operation that is subject to rounding. This is extremely useful because it eliminates the need to mess around with the MXCSR control register when using tools like interval arithmetic that need control of rounding. Unfortunately, support for this is currently broken in GCC. Specifically, the GCC optimizer does not seem to distinguish between function variants with different rounding modes and ends up merging them during common subexpression elimination. Consider the simple program attached below, which computes "1 + pi" with +inf and -inf rounding modes and then prints the difference of these values. The expected output is: $ g++ test.c -o test -mavx512f -O0 -fomit-frame-pointer -fomit-frame-pointer && ./test -4.76837e-07 At optimization level, -O1, this currently stops working (tested with GCC trunk): $ g++ test.c -o test -mavx512f -O0 -fomit-frame-pointer -fomit-frame-pointer && ./test -4.76837e-07 Looking at the assembly, there are two surprising things: first, common subexpression elimination seems to have (partially) merged the two additions. The second add is still generated but its result is never used. The other weird thing is that GCC decides to fill a mask register with '-1' and then use the masked versions of these operations instead of using the unmasked versions, which use a "-1" mask by default. _main: leaq8(%rsp), %r10 andq$-64, %rsp pushq -8(%r10) pushq %rbp movq%rsp, %rbp pushq %r10 subq$40, %rsp movl$-1, %eax kmovw %eax, %k1 vbroadcastssLC0(%rip), %zmm1 vbroadcastssLC1(%rip), %zmm2 vaddps {rd-sae}, %zmm2, %zmm1, %zmm0{%k1}{z} <-- Why use mask? vaddps {ru-sae}, %zmm2, %zmm1, %zmm1{%k1}{z} vsubss %xmm0, %xmm0, %xmm0 <-- xmm0 ?? vcvtss2sd %xmm0, %xmm0, %xmm0 leaqLC2(%rip), %rdi movl$1, %eax call_printf movl$0, %eax addq$40, %rsp popq%r10 popq%rbp leaq-8(%r10), %rsp ret // == Program to reproduce #include #include #include int main(int argc, char *argv[]) { __m512 a = _mm512_set1_ps((float) M_PI); __m512 b = _mm512_set1_ps((float) 1.f); __m512 result1 = _mm512_add_round_ps(a, b, (_MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC)); __m512 result2 = _mm512_add_round_ps(a, b, (_MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC)); printf("%g\n", result1[0] - result2[0]); return 0; }
[Bug target/73350] AVX512: GCC optimizes away rounding flags
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350 --- Comment #1 from Wenzel Jakob --- Sorry, there was a stupid typo in my message below. The middle part should have read At optimization level, -O1, this currently stops working (tested with GCC trunk): $ g++ test.c -o test -mavx512f -O1 -fomit-frame-pointer -fomit-frame-pointer && ./test 0.0
[Bug target/76342] New: AVX512: _mm512_undefined_epi32() intrinsic missing (incorrectly named _mm512_undefined_si512)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76342 Bug ID: 76342 Summary: AVX512: _mm512_undefined_epi32() intrinsic missing (incorrectly named _mm512_undefined_si512) Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- Consider the following snippet: // -- #include __m512 test() { return _mm512_undefined_epi32(); } // -- When compiled with GCC trunk, this yields the following error message: test.cpp: In function '__m512 test()': test.cpp:3:24: error: '_mm512_undefined_epi32' was not declared in this scope __m512 test() { return _mm512_undefined_epi32(); } ^~ test.cpp:3:24: note: suggested alternative: '_mm512_undefined_si512' __m512 test() { return _mm512_undefined_epi32(); } ^~ _mm512_undefined_si512 However, there is no _mm512_undefined_si512 intrinsic. It is called _mm512_undefined_epi32. See here for details: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_undefined_epi32&expand=5509
[Bug target/76342] AVX512: _mm512_undefined_epi32() intrinsic missing (incorrectly named _mm512_undefined_si512)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76342 Wenzel Jakob changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #3 from Wenzel Jakob --- Great, thank you!
[Bug target/76731] New: [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 Bug ID: 76731 Summary: [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- All of the scatter/gather intrinsics in avx512intrin.h use int/float/double pointers, which is incorrect. For intsance: extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_i32gather_epi32 (__m512i __index, int const *__addr, int __scale) These should use void*/const void* pointers according to Intel (see e.g. https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_i32gather_epi32&expand=2778,2777) This is a departure from prior mask/gather intrinsics, where type information turned out to be a bad idea for various reasons (e.g. aliasing analysis)
[Bug target/76731] [AVX512] _mm512_i32gather_epi32 and other scatter/gather routines have incorrect signature
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=76731 --- Comment #2 from Wenzel Jakob --- +1 this looks great!